使用两列数据分割dataframe,并对结果的dataframes列表应用通用转换

时间:2022-05-11 15:51:10

I want to split a large dataframe into a list of dataframes according to the values in two columns. I then want to apply a common data transformation on all dataframes (lag transformation) in the resulting list. I'm aware of the split command but can only get it to work on one column of data at a time.

我想根据两列中的值将一个大的dataframe分割成一个dataframe列表。然后,我想对结果列表中的所有dataframes (lag transformation)应用公共数据转换。我知道split命令,但只能一次处理一个数据列。

2 个解决方案

#1


40  

You need to put all the factors you want to split by in a list, eg:

你需要把你想分解的所有因素都列在一张清单上。

split(mtcars,list(mtcars$cyl,mtcars$gear))

Then you can use lapply on this to do what else you want to do.

然后你可以用lapply来做你想做的其他事情。

#2


6  

how about this one:

这个呢:

 library(plyr)
 ddply(df, .(category1, category2), summarize, value1 = lag(value1), value2=lag(value2))

seems like an excelent job for plyr package and ddply() function. If there are still open questions please provide some sample data. Splitting should work on several columns as well:

对于plyr包和ddply()函数来说,似乎是一项出色的工作。如果还有未解决的问题,请提供一些样本数据。分割也应该在几个列上进行:

df<- data.frame(value=rnorm(100), class1=factor(rep(c('a','b'), each=50)), class2=factor(rep(c('1','2'), 50)))
g <- c(factor(df$class1), factor(df$class2))
split(df$value, g)

#1


40  

You need to put all the factors you want to split by in a list, eg:

你需要把你想分解的所有因素都列在一张清单上。

split(mtcars,list(mtcars$cyl,mtcars$gear))

Then you can use lapply on this to do what else you want to do.

然后你可以用lapply来做你想做的其他事情。

#2


6  

how about this one:

这个呢:

 library(plyr)
 ddply(df, .(category1, category2), summarize, value1 = lag(value1), value2=lag(value2))

seems like an excelent job for plyr package and ddply() function. If there are still open questions please provide some sample data. Splitting should work on several columns as well:

对于plyr包和ddply()函数来说,似乎是一项出色的工作。如果还有未解决的问题,请提供一些样本数据。分割也应该在几个列上进行:

df<- data.frame(value=rnorm(100), class1=factor(rep(c('a','b'), each=50)), class2=factor(rep(c('1','2'), 50)))
g <- c(factor(df$class1), factor(df$class2))
split(df$value, g)