I have a data frame that I've discretized
using RWeka
. RWeka's discretization creates bins with single quotes in them. Although they are not causing any problems, while plotting it looks ugly to have a variable with 'All'
category.
我有一个使用RWeka离散化的数据框架。RWeka的离散化创建了包含单引号的箱子。虽然它们不会产生任何问题,但是如果用一个带有“All”类别的变量来绘制,就显得很难看了。
Here's the discretized data frame:
这是离散化的数据框架:
structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L,
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast",
"rainy"), class = "factor"), temperature = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes",
"no"), class = "factor")), .Names = c("outlook", "temperature",
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")
How can I remove the single quotes from the data and recreate the factors?
如何从数据中删除单引号并重新创建因子?
1 个解决方案
#1
3
This should do it:
这应该这样做:
df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
outlook temperature humidity windy play
1 sunny All All FALSE no
2 sunny All All TRUE no
3 overcast All All FALSE yes
4 rainy All All FALSE yes
5 rainy All All FALSE yes
6 rainy All All TRUE no
7 overcast All All TRUE yes
8 sunny All All FALSE no
9 sunny All All FALSE yes
10 rainy All All FALSE yes
11 sunny All All TRUE yes
12 overcast All All TRUE yes
13 overcast All All FALSE yes
14 rainy All All TRUE no
If you need to do the same over several columns, this might be more efficient.
如果您需要对多个列执行相同的操作,这可能会更有效。
df[, 2:3] <- apply(df[, 2:3], 2, function(x) {
gsub("\\'", "", x)
})
#1
3
This should do it:
这应该这样做:
df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
outlook temperature humidity windy play
1 sunny All All FALSE no
2 sunny All All TRUE no
3 overcast All All FALSE yes
4 rainy All All FALSE yes
5 rainy All All FALSE yes
6 rainy All All TRUE no
7 overcast All All TRUE yes
8 sunny All All FALSE no
9 sunny All All FALSE yes
10 rainy All All FALSE yes
11 sunny All All TRUE yes
12 overcast All All TRUE yes
13 overcast All All FALSE yes
14 rainy All All TRUE no
If you need to do the same over several columns, this might be more efficient.
如果您需要对多个列执行相同的操作,这可能会更有效。
df[, 2:3] <- apply(df[, 2:3], 2, function(x) {
gsub("\\'", "", x)
})