删除标点符号以外的所有字符。

时间:2022-06-12 22:19:01

I have a dataset that has something of the following:

我有一个数据集,它有以下内容:

ID    Type                 Count
1     **Radisson**             8
2     **Renaissance**          9
3     **Hilton** New York Only 8
4     **Radisson** East Cost   8

I want to get a dataset that looks like

我想要一个看起来像的数据集。

ID    Type                 Count
1     **Radisson**             8
2     **Renaissance**          9
3     **Hilton**               8
4     **Radisson**             8

Or even without the * if at all possible.

如果可能的话,甚至没有* if。

Any solutions?

有解决方案吗?

3 个解决方案

#1


3  

You could just sub out everything that isn't between the stars in the beginning.

你可以把一开始不在恒星之间的所有东西都分出来。

df <- data.frame(Type = c("**Radisson**", "**Renaissance**", "**Hilton** New York Only",
                          "**Radisson** East Cost"),
                 Count = c(8, 9, 8, 8))

gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)

[1] "**Radisson**"    "**Renaissance**" "**Hilton**"      "**Radisson**" 

So ...

所以…

df$Type <- gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
df

             Type Count
1    **Radisson**     8
2 **Renaissance**     9
3      **Hilton**     8
4    **Radisson**     8

#2


0  

A solution is to use strsplit on ** and pick 2nd element:

一种解决方案是在**上使用strsplit并选择第二个元素:

df$Type = sapply(strsplit(df$Type, split= "\\*{2}"), function(x)x[2])
df
#   ID        Type Count
# 1  1    Radisson     8
# 2  2 Renaissance     9
# 3  3      Hilton     8
# 4  4    Radisson     8

#3


0  

Here is an option with str_extract

这里有一个带有str_extract的选项

library(stringr)
library(dplyr)
df %>% 
   mutate(Type = str_extract(Type, "[*]*[^*]*[*]*"))
#              Type Count
#1    **Radisson**     8
#2 **Renaissance**     9
#3      **Hilton**     8
#4    **Radisson**     8

#1


3  

You could just sub out everything that isn't between the stars in the beginning.

你可以把一开始不在恒星之间的所有东西都分出来。

df <- data.frame(Type = c("**Radisson**", "**Renaissance**", "**Hilton** New York Only",
                          "**Radisson** East Cost"),
                 Count = c(8, 9, 8, 8))

gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)

[1] "**Radisson**"    "**Renaissance**" "**Hilton**"      "**Radisson**" 

So ...

所以…

df$Type <- gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
df

             Type Count
1    **Radisson**     8
2 **Renaissance**     9
3      **Hilton**     8
4    **Radisson**     8

#2


0  

A solution is to use strsplit on ** and pick 2nd element:

一种解决方案是在**上使用strsplit并选择第二个元素:

df$Type = sapply(strsplit(df$Type, split= "\\*{2}"), function(x)x[2])
df
#   ID        Type Count
# 1  1    Radisson     8
# 2  2 Renaissance     9
# 3  3      Hilton     8
# 4  4    Radisson     8

#3


0  

Here is an option with str_extract

这里有一个带有str_extract的选项

library(stringr)
library(dplyr)
df %>% 
   mutate(Type = str_extract(Type, "[*]*[^*]*[*]*"))
#              Type Count
#1    **Radisson**     8
#2 **Renaissance**     9
#3      **Hilton**     8
#4    **Radisson**     8