将列表中的数据帧相互合并

时间:2021-04-28 18:37:01

What I need:

I have a huge data frame with the following columns (and some more, but these are not important). Here's an example:

我有一个巨大的数据框,包含以下列(还有一些,但这些并不重要)。这是一个例子:

    user_id video_id group_id    x   y
1         1        0        0   39 108
2         1        0        0   39 108
3         1       10        0  135 180
4         2        0        0   20 123

User, video and group IDs are factors, of course. For example, there are 20 videos, but each of them has several "observations" for each user and group.

当然,用户,视频和组ID是因素。例如,有20个视频,但每个视频对每个用户和组都有几个“观察”。

I'd like to transform this data frame into the following format, where there are as many x.N, y.N as there are users (N).

我想将这个数据帧转换成以下格式,其中有x.N,y.N和用户(N)一样多。

video_id  x.1   y.1  x.2  y.2  …
       0   39   108   20  123

So, for video 0, the x and y values from user 1 are in columns x.1 and y.1, respectively. For user 2, their values are in columns x.2, y.2, and so on.

因此,对于视频0,来自用户1的x和y值分别在列x.1和y.1中。对于用户2,它们的值在列x.2,y.2等中。

What I've tried:

I made myself a list of data frames that are solely composed of all the x, y observations for each video_id:

我自己制作了一个数据帧列表,它们只包含每个video_id的所有x,y观测值:

summaryList = dlply(allData, .(user_id), function(x) unique(x[c("video_id","x","y")]) )

That's how it looks like:

这就是它的样子:

List of 15
 $ 1 :'data.frame': 20 obs. of  3 variables:
  ..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 1 11 8 5 12 9 20 13 7 10 ...
  ..$ x       : int [1:20] 39 135 86 122 28 167 203 433 549 490 ...
  ..$ y       : int [1:20] 108 180 164 103 187 128 185 355 360 368 ...
 $ 2 :'data.frame': 20 obs. of  3 variables:
  ..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 2 14 15 4 20 6 19 3 13 18 ...
  ..$ x       : int [1:20] 128 688 435 218 528 362 299 134 83 417 ...
  ..$ y       : int [1:20] 165 117 135 179 96 328 332 563 623 476 ...

Where I'm stuck:

What's left to do is:

剩下要做的是:

  • Merge each data frame from the summaryList with each other, based on the video_id. I can't find a nice way to access the actual data frames in the list, which are summaryList[1]$`1`, summaryList[2]$`2`, et cetera.

    根据video_id,将summaryList中的每个数据帧相互合并。我找不到一种很好的方法来访问列表中的实际数据帧,它们是summaryList [1] $`1`,summaryList [2] $`2`等等。

    @James found out a partial solution:

    @James找到了部分解决方案:

    Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)
    
  • Ensure the column names are renamed after the user ID and not kept as-is. Right now my summaryList doesn't contain any info about the user ID, and the output of Reduce has duplicate column names like x.x y.x x.y y.y x.x y.x and so on.

    确保在用户标识之后重命名列名称,而不是保持原样。现在我的summaryList不包含任何关于用户ID的信息,Reduce的输出有重复的列名,如x.x y.x x.y y.y x.x y.x等等。

How do I go about doing this? Or is there any easier way to get to the result than what I'm currently doing?

我该怎么做呢?或者有没有比我目前正在做的更简单的方法来获得结果?

2 个解决方案

#1


3  

Reduce does the trick:

减少诀窍:

reducedData <- Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)

… but you need to fix the names afterwards:

...但你需要事后修改名称:

names(reducedData)[-1] <- do.call(function(...) paste(...,sep="."),expand.grid(letters[24:25],names(summaryList)))

The result is:

结果是:

   video_id  x.1 y.1  x.2 y.2  x.3 y.3  x.4 y.4  x.5 y.5  x.6 y.6  x.7 y.7  x.8
1         0   39 108  899 132   61 357  149 298 1105 415  148 208  442 200  210
2         1 1125  70  128 165 1151 390  171 587  623 623   80 643  866 310  994

#2


4  

I am still somewhat confused. However, I guess you simply want to melt and dcast.

我还是有些困惑。但是,我猜你只想融化和播放。

library(reshape2)
d <- melt(allData,id.vars=c("user_id","video_id"), measure.vars=c("x","y"))
dcast(d,video_id~user_id+variable,value.var="value",fun.aggregate=mean)

Resulting in:

导致:

 video_id  1_x 1_y  2_x 2_y  3_x 3_y  4_x 4_y  5_x 5_y  6_x 6_y  7_x 7_y  8_x 8_y  9_x 9_y 10_x 10_y 11_x 11_y 12_x 12_y 14_x 14_y 15_x 15_y 16_x 16_y
1         0   39 108  899 132   61 357  149 298 1105 415  148 208  442 200  210 134   58 244  910  403  152   52 1092  617 1012  114 1105  424  548  394
2         1 1125  70  128 165 1151 390  171 587  623 623   80 643  866 310  994 114  854 129  781  306  672   -1 1096  354  525  524  150 

#1


3  

Reduce does the trick:

减少诀窍:

reducedData <- Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)

… but you need to fix the names afterwards:

...但你需要事后修改名称:

names(reducedData)[-1] <- do.call(function(...) paste(...,sep="."),expand.grid(letters[24:25],names(summaryList)))

The result is:

结果是:

   video_id  x.1 y.1  x.2 y.2  x.3 y.3  x.4 y.4  x.5 y.5  x.6 y.6  x.7 y.7  x.8
1         0   39 108  899 132   61 357  149 298 1105 415  148 208  442 200  210
2         1 1125  70  128 165 1151 390  171 587  623 623   80 643  866 310  994

#2


4  

I am still somewhat confused. However, I guess you simply want to melt and dcast.

我还是有些困惑。但是,我猜你只想融化和播放。

library(reshape2)
d <- melt(allData,id.vars=c("user_id","video_id"), measure.vars=c("x","y"))
dcast(d,video_id~user_id+variable,value.var="value",fun.aggregate=mean)

Resulting in:

导致:

 video_id  1_x 1_y  2_x 2_y  3_x 3_y  4_x 4_y  5_x 5_y  6_x 6_y  7_x 7_y  8_x 8_y  9_x 9_y 10_x 10_y 11_x 11_y 12_x 12_y 14_x 14_y 15_x 15_y 16_x 16_y
1         0   39 108  899 132   61 357  149 298 1105 415  148 208  442 200  210 134   58 244  910  403  152   52 1092  617 1012  114 1105  424  548  394
2         1 1125  70  128 165 1151 390  171 587  623 623   80 643  866 310  994 114  854 129  781  306  672   -1 1096  354  525  524  150