定义数据框中因子的级别

时间:2021-12-24 07:36:27

Suppose you have a data.frame with a number of factors with varying numbers of levels:

假设您有一个数据框架,其中包含许多不同级别的因素:

V1<-factor(sample(c(1:5,9),100,TRUE))
V2<-factor(sample(c(1:5,9),100,TRUE))
V3<-factor(sample(c(1:5),100,TRUE))
V4<-factor(sample(c(1:5),100,TRUE))
dat<-data.frame(V1,V2,V3,V4)

The goal is to estimate the difference in level frequencies for two factors. However, due to different numbers of levels, the arrays from two tables based on V1/V2 and V3/V4 are not conformable, e.g.:

目标是估计两个因素的能级频率的差异。但是,由于级别的不同,基于V1/V2和V3/V4的两个表的数组不一致,例如:

table(dat$V1)-table(dat$V3)
Error in table(dat$V1) - table(dat$V3) : non-conformable arrays

The goal is to make V3 and V4 conformable so that the operation is valid. One option is:

目标是使V3和V4符合,以便操作有效。一种选择是:

dat$V3<-factor(dat$V3,levels=c('1','2','3','4','5','9')

However, it requires setting the factor levels for each variable and this is impractical for many variables V5,...,Vn, say. I thought

但是,它需要为每个变量设置因子级别,这对于许多变量V5来说是不切实际的,……Vn说。我认为

dat[,3:4]<-apply(dat[,3:4],2,factor,levels=c('1','2','3','4','5','9'))

might work in more general terms, but is.factor(dat$V3) is FALSE then.

可能用更一般的术语来说是可行的,但是is.factor(dat$V3)是错误的。

EDIT: This function might complete the answer by SimonO101:

编辑:这个函数可以完成SimonO101的答案:

correct_factors<-function(df_object,range){

  if(is.data.frame(df_object)==FALSE){stop('Requires data.frame object')}
  levs <- unique( unlist( lapply( df_object[,range[1]:range[2]] , levels ) ) )
  df_object[,range[1]:range[2]] <- 
     data.frame( lapply( df_object[,range[1]:range[2]] , factor , levels = levs ) )
  return(df_object)      

}

1 个解决方案

#1


4  

Try this to harmonise the levels...

试试这个来协调层次……

#  Get vector of all levels that appear in the data.frame
levs <- unique( unlist( lapply( dat , levels ) ) )

#  Set these as the levels for each column    
dat2 <- data.frame( lapply( dat , factor , levels = levs ) )

table(dat2$V1)-table(dat2$V3)
#  1   2   3   4   5   9 
#-15  -5   4   7  -5  14 

#1


4  

Try this to harmonise the levels...

试试这个来协调层次……

#  Get vector of all levels that appear in the data.frame
levs <- unique( unlist( lapply( dat , levels ) ) )

#  Set these as the levels for each column    
dat2 <- data.frame( lapply( dat , factor , levels = levs ) )

table(dat2$V1)-table(dat2$V3)
#  1   2   3   4   5   9 
#-15  -5   4   7  -5  14