如何覆盖R中的因子

时间:2021-05-03 00:04:14

I have a dataset:

我有一个数据集:

> k
       EVTYPE FATALITIES INJURIES
198704   HEAT        583        0
862634   WIND        158     1150
68670    WIND        116      785
148852   WIND        114      597
355128   HEAT         99        0
67884    WIND         90     1228
46309    WIND         75      270
371112   HEAT         74      135
230927   HEAT         67        0
78567    WIND         57      504

The variables are as follows. As per the first answer by joran, unused levels can be dropped by droplevels, so no worry about the 898 levels, the illustrative k I'm showing is the complete dataset obtained from k <- d1[1:10, 3:4] where d1 is the original dataset.

变量如下。根据joran的第一个答案,未使用的级别可以通过droplevels来删除,所以不用担心898级别,我显示的说明性k是从k < - d1 [1:10,3:4]获得的完整数据集其中d1是原始数据集。

> str(k)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 898 levels "   HIGH SURF ADVISORY",..: 243 NA NA NA 243 NA NA 243 243 NA
 $ FATALITIES: num  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : num  0 1150 785 597 0 ...

I'm trying to overwrite the WIND factor:

我正在尝试覆盖WIND因子:

> k[k$EVTYPE==factor("WIND"), ]$EVTYPE <- factor("AFDAF")
> k[k$EVTYPE=="WIND", ]$EVTYPE <- factor("AFDAF")

But both commands give me error messages: level sets of factors are different or invalid factor level, NA generated.

但是这两个命令都给我错误信息:级别因子集不同或因子级别无效,NA生成。

How should I do this?

我该怎么做?

1 个解决方案

#1


1  

Try this instead:

试试这个:

k <- droplevels(d1[1:10, 3:5])

Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. These are called the "levels". The levels are an attribute, and persist with your data even when subsetting.

因素(根据文档)只是整数代码的向量,然后是每个代码的简单标签向量。这些被称为“水平”。级别是一个属性,即使在子集化时也会保留您的数据。

This is a feature, since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data.

这是一个特性,因为对于许多统计过程而言,跟踪变量可能具有的所有可能值是至关重要的,即使它们没有出现在实际数据中也是如此。

Some people find this irritation and run R using options(stringsAsFactors = FALSE).

有些人发现这种烦恼并使用选项运行R(stringsAsFactors = FALSE)。

To simply change the levels, you can do something like this:

要简单地更改级别,您可以执行以下操作:

d <- read.table(text = "      EVTYPE FATALITIES INJURIES
 198704   HEAT        583        0
 862634   WIND        158     1150
 68670    WIND        116      785
 148852   WIND        114      597
 355128   HEAT         99        0
 67884    WIND         90     1228
 46309    WIND         75      270
 371112   HEAT         74      135
 230927   HEAT         67        0
 78567    WIND         57      504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504

Or to just change one:

或者只改变一个:

levels(d$EVTYPE)[2] <- 'C'

#1


1  

Try this instead:

试试这个:

k <- droplevels(d1[1:10, 3:5])

Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. These are called the "levels". The levels are an attribute, and persist with your data even when subsetting.

因素(根据文档)只是整数代码的向量,然后是每个代码的简单标签向量。这些被称为“水平”。级别是一个属性,即使在子集化时也会保留您的数据。

This is a feature, since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data.

这是一个特性,因为对于许多统计过程而言,跟踪变量可能具有的所有可能值是至关重要的,即使它们没有出现在实际数据中也是如此。

Some people find this irritation and run R using options(stringsAsFactors = FALSE).

有些人发现这种烦恼并使用选项运行R(stringsAsFactors = FALSE)。

To simply change the levels, you can do something like this:

要简单地更改级别,您可以执行以下操作:

d <- read.table(text = "      EVTYPE FATALITIES INJURIES
 198704   HEAT        583        0
 862634   WIND        158     1150
 68670    WIND        116      785
 148852   WIND        114      597
 355128   HEAT         99        0
 67884    WIND         90     1228
 46309    WIND         75      270
 371112   HEAT         74      135
 230927   HEAT         67        0
 78567    WIND         57      504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504

Or to just change one:

或者只改变一个:

levels(d$EVTYPE)[2] <- 'C'