Is there a way to subset all levels of a single factor in one clean swoop?
有没有办法在一个干净的猛扑中将单个因子的所有级别分组?
Case: Assuming you have a data frame where one of the columns is a factor (data$factor) and you want to create subset data frames that contain only one level of the factor. This is simple to do when there are a small number of factors by writing separate subset commands. However, what if you have a large number of levels (e.g. 50+ levels)? Is there a command or a clever way to create all the subsets in such a case without having to write 50+ subset commands?
案例:假设您有一个数据框,其中一列是一个因子(数据$因子),并且您想要创建仅包含一个因子级别的子集数据帧。当通过编写单独的子集命令存在少量因素时,这很容易做到。但是,如果您有大量级别(例如50级以上),该怎么办?在这种情况下是否有命令或巧妙的方法来创建所有子集而无需编写50多个子集命令?
2 个解决方案
#1
13
Without having to create a loop, the SPLIT function is key to solving this problem.
无需创建循环,SPLIT函数是解决此问题的关键。
Assuming the factor column you want to subset (or subgroup) is in the column "factor" of the data frame "data" do:
假设您想要子集(或子组)的因子列位于数据框“data”的“factor”列中,请执行以下操作:
subsets<-split(data, data$factor, drop=TRUE)
This will create a list of subsets based on the factor value. The list will have the same length as the number of factors.
这将基于因子值创建子集列表。该列表的长度与因子数相同。
If you need to put each subset in a separate data frame, you can access them by doing the following:
如果需要将每个子集放在单独的数据框中,可以通过执行以下操作来访问它们:
group1<-subsets[[1]]
group2<-subsets[[2]]
...
#2
0
You can create a loop over the requested factor values as follows:
您可以在请求的因子值上创建循环,如下所示:
vals <- sort (unique (data[[factor]]))
for (i in 1:length(vals)) {
subset <- (data[[factor]]==vals[i])
n <- length (data[[factor]][(subset)])
if (n >= min.n) {
...
}
}
#1
13
Without having to create a loop, the SPLIT function is key to solving this problem.
无需创建循环,SPLIT函数是解决此问题的关键。
Assuming the factor column you want to subset (or subgroup) is in the column "factor" of the data frame "data" do:
假设您想要子集(或子组)的因子列位于数据框“data”的“factor”列中,请执行以下操作:
subsets<-split(data, data$factor, drop=TRUE)
This will create a list of subsets based on the factor value. The list will have the same length as the number of factors.
这将基于因子值创建子集列表。该列表的长度与因子数相同。
If you need to put each subset in a separate data frame, you can access them by doing the following:
如果需要将每个子集放在单独的数据框中,可以通过执行以下操作来访问它们:
group1<-subsets[[1]]
group2<-subsets[[2]]
...
#2
0
You can create a loop over the requested factor values as follows:
您可以在请求的因子值上创建循环,如下所示:
vals <- sort (unique (data[[factor]]))
for (i in 1:length(vals)) {
subset <- (data[[factor]]==vals[i])
n <- length (data[[factor]][(subset)])
if (n >= min.n) {
...
}
}