在R中，基于元素名重新组织列表(rbind和indicator变量)

I am trying to reorganize my data, basically a list of data.frames. Its elements represent subjects of interest (A and B), with observations on x and y, collected on two occasions (1 and 2). I am trying to make this a list that contains data.frames referring to the subjects, with the information on which occasion x and y were collected being stored in the respective data.frames as new variable, as opposed to the element name:

我试图重新组织我的数据，基本上是一个数据。frame列表。其元素代表感兴趣的主题(A和B),观察x和y,收集两次(1和2)。我想让这个列表包含data.frames指的主题,与x和y场合收集的信息被存储在各自data.frames作为新变量,而不是元素名称:

library('rlist')

A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))

list <- list(A1=A1,A2=A2,B1=B1,B2=B2)

A <- do.call(rbind,list.match(list,"A"))
B <- do.call(rbind,list.match(list,"B"))

list <- list(A=A,B=B)
list <- lapply(list,function(x) {
      y <- data.frame(x)
      y$class <- c(rep.int(1,2),rep.int(2,2))
      return(y)
})

> list
$A
      x  y class
A1.1 66 96     1
A1.2 76 58     1
A2.1 50 93     2
A2.2 57 12     2

$B
      x  y class
B1.1 58 56     1
B1.2 69 15     1
B2.1 77 77     2
B2.2  9  9     2

In my real world problem there are about 500 subjects, not always two occasions, differing numbers of observations.

在我的现实世界问题中有大约500个主题，并不总是两个不同的场合，不同数量的观察。

So my example above is just to illustrate where I want to get, and I am stuck at how to pass to the do.call-rbind that it should, based on elements names, bind subject-specific elements as new list elements together, while assigning a new variable.

我上面的例子只是为了说明我想要得到什么，我被困在了如何传递到do。它应该基于元素名称调用-rbind，将主题特定的元素作为新的列表元素绑定在一起，同时分配一个新的变量。

To me, this is a somewhat fuzzy task, and the closest I got was the rlist package. This question is related but uses unique to identify elements, whereas in my case it seems to be more a regex problem.

对我来说，这是一个有点模糊的任务，我得到的最接近的是rlist包。这个问题是相关的，但是使用unique来标识元素，而在我的例子中，它更像是一个regex问题。

I'd be happy even for instructions on how to use google, any keywords for further research etc.

我甚至很乐意得到关于如何使用谷歌的指导，任何用于进一步研究的关键字等等。

2 个解决方案

#1

It sounds like you're doing a lot of gymnastics because you have a specific form in mind. What I would suggest is first trying to make the data tidy. Without reading the link, the quick summary is to put your data into a single data frame, where it can be easily processed.

听起来你做了很多体操，因为你有一个特定的形式在心里。我的建议是，首先要使数据保持整洁。在不读取链接的情况下，快速总结是将数据放入一个单独的数据帧中，以便于处理。

The quick version of the answer (here I've used lst instead of list for the name to avoid confusion with the built-in list) is to do this:

快速版本的答案(这里我用lst代替了list，以避免与内置列表混淆)是这样做的:

do.call(rbind,
  lapply(seq(lst), function(i) {
    lst[[i]]$type <- names(lst)[i]; lst[[i]]
  })
)

What this will do is create a single data frame, with a column, "type", that contains the name of the list item in which that row appeared.

这样做的目的是创建一个包含列“type”的单一数据帧，该列包含该行出现的列表项的名称。

Using a slightly simplified version of your initial data:

使用您的初始数据的稍微简化版本:

lst <- list(A1=data.frame(x=rnorm(5)), A2=data.frame(x=rnorm(3)), B=data.frame(x=rnorm(5)))
lst
$A1
           x
1  1.3386071
2  1.9875317
3  0.4942179
4 -0.1803087
5  0.3094100

$A2
           x
1 -0.3388195
2  1.1993115
3  1.9524970

$B
           x
1 -0.1317882
2 -0.3383545
3  0.8864144
4  0.9241305
5 -0.8481927

And then applying the magic function

然后应用这个神奇的函数。

df <- do.call(rbind,
   lapply(seq(lst), function(i) {
     lst[[i]]$type <- names(lst)[i]; lst[[i]]
   })
 )
df
            x type
1   1.3386071   A1
2   1.9875317   A1
3   0.4942179   A1
4  -0.1803087   A1
5   0.3094100   A1
6  -0.3388195   A2
7   1.1993115   A2
8   1.9524970   A2
9  -0.1317882    B
10 -0.3383545    B
11  0.8864144    B
12  0.9241305    B
13 -0.8481927    B

From here we can process to our hearts content; with operations like df$subject <- gsub("[0-9]*", "", df$type) to extract the non-numeric portion of type, and tools like split can be used to generate the sub-lists that you mention in your question.

从这里，我们可以处理我们的心满意;使用df$subject <- gsub(“[0-9]*”、“”、df$type)之类的操作来提取类型的非数值部分，并且可以使用split之类的工具来生成您在问题中提到的子列表。

In addition, once it is in this form, you can use functions like by and aggregate or libraries like dplyr or data.table to do more advanced split-apply-combine operations for data analysis.

此外，一旦它以这种形式出现，您就可以使用by和聚合之类的函数或库，如dplyr或数据。表进行更高级的分割-应用-组合操作进行数据分析。

#2

From the data you provided:

从你提供的数据:

subj <- sub("[A-Z]*", "", names(lst))
newlst <- Map(function(x, y) {x[,"class"] <- y;x}, lst, subj)

First we do the regular expression call to isolate the number that will go in the class column. In this case, I matched on capital letters and erased them leaving the number. Therefore, "A1" becomes "1". Please note that the real names will mean a different regex pattern.

首先，我们执行正则表达式调用来隔离类列中的数字。在这种情况下，我对大写字母进行匹配，并删除它们留下的数字。因此,“A1”变成了“1”。请注意，真正的名称将意味着不同的regex模式。

Then we use Map to create a new column for each data frame and save to a new list called newlst. Map takes the first element of each argument and carries out the function then continues on with each object element. So the first data frame in lst and the first number in subj are used first. The anonymous function I used is function(x,y) {x[, "class"] <- y; x}. It takes two arguments. The first is the data frame, the second is the column value.

然后，我们使用Map为每个数据帧创建一个新的列，并保存到一个名为newlst的新列表中。Map获取每个参数的第一个元素，执行函数，然后继续处理每个对象元素。因此，首先使用lst中的第一个数据帧和subj中的第一个数据。我使用的匿名函数是函数(x,y) {x[， "class"] <- y;x }。它需要两个参数。第一个是数据帧，第二个是列值。

Now it's much easier to move forward. We can create a vector called uniq.nmes to get the names of the data frames that we will combine. Where "A1" will become "A". Then we can rbind on that match:

现在更容易前进。我们可以创建一个叫做uniq的矢量。获取我们要合并的数据帧的名称。其中“A1”将变成“A”。然后我们可以在匹配上进行rbind:

uniq.nmes <- unique(sub("\\d", "", names(lst)))
lapply(uniq.nmes, function(x) {
  do.call(rbind, newlst[grep(x, names(newlst))])
})
# [[1]]
#       x  y class
# A1.1  1 79     1
# A1.2 30 13     1
# A2.1 90 39     2
# A2.2 43 22     2
# 
# [[2]]
#       x  y class
# B1.1 54 59     1
# B1.2 83 90     1
# B2.1 85 36     2
# B2.2 91 28     2

Data

数据

A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))

lst <- list(A1=A1,A2=A2,B1=B1,B2=B2)

#1