I have a dataset:
我有一个数据集:
> k
EVTYPE FATALITIES INJURIES
198704 HEAT 583 0
862634 WIND 158 1150
68670 WIND 116 785
148852 WIND 114 597
355128 HEAT 99 0
67884 WIND 90 1228
46309 WIND 75 270
371112 HEAT 74 135
230927 HEAT 67 0
78567 WIND 57 504
The variables are as follows. As per the first answer by joran, unused levels can be dropped by droplevels
, so no worry about the 898 levels, the illustrative k
I'm showing is the complete dataset obtained from k <- d1[1:10, 3:4]
where d1
is the original dataset.
变量如下。根据joran的第一个答案,未使用的级别可以通过droplevels来删除,所以不用担心898级别,我显示的说明性k是从k < - d1 [1:10,3:4]获得的完整数据集其中d1是原始数据集。
> str(k)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 898 levels " HIGH SURF ADVISORY",..: 243 NA NA NA 243 NA NA 243 243 NA
$ FATALITIES: num 583 158 116 114 99 90 75 74 67 57
$ INJURIES : num 0 1150 785 597 0 ...
I'm trying to overwrite the WIND
factor:
我正在尝试覆盖WIND因子:
> k[k$EVTYPE==factor("WIND"), ]$EVTYPE <- factor("AFDAF")
> k[k$EVTYPE=="WIND", ]$EVTYPE <- factor("AFDAF")
But both commands give me error messages: level sets of factors are different
or invalid factor level, NA generated
.
但是这两个命令都给我错误信息:级别因子集不同或因子级别无效,NA生成。
How should I do this?
我该怎么做?
1 个解决方案
#1
1
Try this instead:
试试这个:
k <- droplevels(d1[1:10, 3:5])
Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. These are called the "levels". The levels are an attribute, and persist with your data even when subsetting.
因素(根据文档)只是整数代码的向量,然后是每个代码的简单标签向量。这些被称为“水平”。级别是一个属性,即使在子集化时也会保留您的数据。
This is a feature, since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data.
这是一个特性,因为对于许多统计过程而言,跟踪变量可能具有的所有可能值是至关重要的,即使它们没有出现在实际数据中也是如此。
Some people find this irritation and run R using options(stringsAsFactors = FALSE)
.
有些人发现这种烦恼并使用选项运行R(stringsAsFactors = FALSE)。
To simply change the levels, you can do something like this:
要简单地更改级别,您可以执行以下操作:
d <- read.table(text = " EVTYPE FATALITIES INJURIES
198704 HEAT 583 0
862634 WIND 158 1150
68670 WIND 116 785
148852 WIND 114 597
355128 HEAT 99 0
67884 WIND 90 1228
46309 WIND 75 270
371112 HEAT 74 135
230927 HEAT 67 0
78567 WIND 57 504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
$ FATALITIES: int 583 158 116 114 99 90 75 74 67 57
$ INJURIES : int 0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
$ FATALITIES: int 583 158 116 114 99 90 75 74 67 57
$ INJURIES : int 0 1150 785 597 0 1228 270 135 0 504
Or to just change one:
或者只改变一个:
levels(d$EVTYPE)[2] <- 'C'
#1
1
Try this instead:
试试这个:
k <- droplevels(d1[1:10, 3:5])
Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. These are called the "levels". The levels are an attribute, and persist with your data even when subsetting.
因素(根据文档)只是整数代码的向量,然后是每个代码的简单标签向量。这些被称为“水平”。级别是一个属性,即使在子集化时也会保留您的数据。
This is a feature, since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data.
这是一个特性,因为对于许多统计过程而言,跟踪变量可能具有的所有可能值是至关重要的,即使它们没有出现在实际数据中也是如此。
Some people find this irritation and run R using options(stringsAsFactors = FALSE)
.
有些人发现这种烦恼并使用选项运行R(stringsAsFactors = FALSE)。
To simply change the levels, you can do something like this:
要简单地更改级别,您可以执行以下操作:
d <- read.table(text = " EVTYPE FATALITIES INJURIES
198704 HEAT 583 0
862634 WIND 158 1150
68670 WIND 116 785
148852 WIND 114 597
355128 HEAT 99 0
67884 WIND 90 1228
46309 WIND 75 270
371112 HEAT 74 135
230927 HEAT 67 0
78567 WIND 57 504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
$ FATALITIES: int 583 158 116 114 99 90 75 74 67 57
$ INJURIES : int 0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame': 10 obs. of 3 variables:
$ EVTYPE : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
$ FATALITIES: int 583 158 116 114 99 90 75 74 67 57
$ INJURIES : int 0 1150 785 597 0 1228 270 135 0 504
Or to just change one:
或者只改变一个:
levels(d$EVTYPE)[2] <- 'C'