从数据框的元素中删除引号

时间:2021-02-17 20:09:26

I have a data frame that I've discretized using RWeka. RWeka's discretization creates bins with single quotes in them. Although they are not causing any problems, while plotting it looks ugly to have a variable with 'All' category.

我有一个使用RWeka离散化的数据框架。RWeka的离散化创建了包含单引号的箱子。虽然它们不会产生任何问题,但是如果用一个带有“All”类别的变量来绘制,就显得很难看了。

Here's the discretized data frame:

这是离散化的数据框架:

structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast", 
"rainy"), class = "factor"), temperature = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"), 
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"), 
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, 
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L, 
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes", 
"no"), class = "factor")), .Names = c("outlook", "temperature", 
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")

How can I remove the single quotes from the data and recreate the factors?

如何从数据中删除单引号并重新创建因子?

1 个解决方案

#1


3  

This should do it:

这应该这样做:

df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
    outlook temperature humidity windy play
1     sunny         All      All FALSE   no
2     sunny         All      All  TRUE   no
3  overcast         All      All FALSE  yes
4     rainy         All      All FALSE  yes
5     rainy         All      All FALSE  yes
6     rainy         All      All  TRUE   no
7  overcast         All      All  TRUE  yes
8     sunny         All      All FALSE   no
9     sunny         All      All FALSE  yes
10    rainy         All      All FALSE  yes
11    sunny         All      All  TRUE  yes
12 overcast         All      All  TRUE  yes
13 overcast         All      All FALSE  yes
14    rainy         All      All  TRUE   no

If you need to do the same over several columns, this might be more efficient.

如果您需要对多个列执行相同的操作,这可能会更有效。

df[, 2:3] <- apply(df[, 2:3], 2, function(x) { 
    gsub("\\'", "", x)
    })

#1


3  

This should do it:

这应该这样做:

df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
    outlook temperature humidity windy play
1     sunny         All      All FALSE   no
2     sunny         All      All  TRUE   no
3  overcast         All      All FALSE  yes
4     rainy         All      All FALSE  yes
5     rainy         All      All FALSE  yes
6     rainy         All      All  TRUE   no
7  overcast         All      All  TRUE  yes
8     sunny         All      All FALSE   no
9     sunny         All      All FALSE  yes
10    rainy         All      All FALSE  yes
11    sunny         All      All  TRUE  yes
12 overcast         All      All  TRUE  yes
13 overcast         All      All FALSE  yes
14    rainy         All      All  TRUE   no

If you need to do the same over several columns, this might be more efficient.

如果您需要对多个列执行相同的操作,这可能会更有效。

df[, 2:3] <- apply(df[, 2:3], 2, function(x) { 
    gsub("\\'", "", x)
    })