将新列添加到表或数据框列表中的每个元素

时间:2022-03-18 18:36:39

I have a list of files. I also have a list of "names" which I substr() from the actual filenames of these files. I would like to add a new column to each of the files in the list. This column will contain the corresponding element in "names" repeated times the number of rows in the file.

我有一个文件列表。我还有一个“名称”列表,我从这些文件的实际文件名中输入了substr()。我想为列表中的每个文件添加一个新列。此列将包含“names”中相应元素,重复次数为文件中的行数。

For example:

例如:

df1 <- data.frame(x = 1:3, y=letters[1:3])
df2 <- data.frame(x = 4:6, y=letters[4:6])
filelist <- list(df1,df2)
ID <- c("1A","IB")

Pseudocode

伪代码

  for( i in length(filelist)){

       filelist[i]$SampleID <- rep(ID[i],nrow(filelist[i])

  }

// basically create a new column in each of the dataframes in filelist, and fill the column with repeted corresponding values of ID

//基本上在filelist的每个数据框中创建一个新列,并用重复的相应ID值填充该列

my output should be like:

我的输出应该是这样的:

filelist[1] should be:

filelist [1]应该是:

 x y SAmpleID
 1 1 a       1A
 2 2 b       1A
 3 3 c       1A

fileList[2]

的fileList [2]

 x y SampleID
 1 4 d       IB
 2 5 e       IB
 3 6 f       IB

and so on.....

等等.....

Any Idea how it could be done.

任何想法如何做到这一点。

4 个解决方案

#1


29  

An alternate solution is to use cbind, and taking advantage of the fact that R will recylce values of a shorter vector.

另一种解决方案是使用cbind,并利用R将重新定义较短矢量值的事实。

For Example

例如

x <- df2  # from above
cbind(x, NewColumn="Singleton")
 #    x y NewColumn
 #  1 4 d Singleton
 #  2 5 e Singleton
 #  3 6 f Singleton

There is no need for the use of rep. R does that for you.

没有必要使用代表。 R为你做到了。

Therfore, you could put cbind(filelist[[i]], ID[[i]]) in your for loop or as @Sven pointed out, you can use the cleaner mapply:

因此,您可以将cbind(filelist [[i]],ID [[i]])放入for循环中,或者@Sven指出,您可以使用更清洁的mapply:

filelist <- mapply(cbind, filelist, "SampleID"=ID, SIMPLIFY=F)

#2


16  

This is a corrected version of your loop:

这是你的循环的更正版本:

for( i in seq_along(filelist)){

  filelist[[i]]$SampleID <- rep(ID[i],nrow(filelist[[i]]))

}

There were 3 problems:

有3个问题:

  • A final ) was missing after the command in the body.
  • 在身体命令后,最后一个)失踪了。
  • Elements of lists are accessed by [[, not by [. [ returns a list of length one. [[ returns the element only.
  • 列表元素由[[,而不是[。]访问。 [返回长度为一的列表。 [[仅返回元素。
  • length(filelist) is just one value, so the loop runs for the last element of the list only. I replaced it with seq_along(filelist).
  • length(filelist)只是一个值,因此循环仅针对列表的最后一个元素运行。我用seq_along(filelist)替换它。

A more efficient approach is to use mapply for the task:

更有效的方法是使用mapply完成任务:

mapply(function(x, y) "[<-"(x, "SampleID", value = y) ,
       filelist, ID, SIMPLIFY = FALSE)

#3


2  

A tricky way:

一个棘手的方法:

library(plyr)

names(filelist) <- ID
result <- ldply(filelist, data.frame)

#4


0  

This one worked for me:

这个对我有用:

Create a new column for every dataframe in a list; fill the values of the new column based on existing column. (In your case IDs).

为列表中的每个数据框创建一个新列;根据现有列填充新列的值。 (在你的情况下ID)。

Example:

例:

# Create dummy data
df1<-data.frame(a = c(1,2,3))
df2<-data.frame(a = c(5,6,7))

# Create a list
l<-list(df1, df2)

> l
[[1]]
  a
1 1
2 2
3 3

[[2]]
  a
1 5
2 6
3 7

# add new column 'b'
# create 'b' values based on column 'a' 
l2<-lapply(l, function(x) 
  cbind(x, b = x$a*4))

Results in:

结果是:

> l2
[[1]]
  a  b
1 1  4
2 2  8
3 3 12

[[2]]
  a  b
1 5 20
2 6 24
3 7 28

In your case something like:

在你的情况下,像:

filelist<-lapply(filelist, function(x) 
  cbind(x, b = x$SampleID))

#1


29  

An alternate solution is to use cbind, and taking advantage of the fact that R will recylce values of a shorter vector.

另一种解决方案是使用cbind,并利用R将重新定义较短矢量值的事实。

For Example

例如

x <- df2  # from above
cbind(x, NewColumn="Singleton")
 #    x y NewColumn
 #  1 4 d Singleton
 #  2 5 e Singleton
 #  3 6 f Singleton

There is no need for the use of rep. R does that for you.

没有必要使用代表。 R为你做到了。

Therfore, you could put cbind(filelist[[i]], ID[[i]]) in your for loop or as @Sven pointed out, you can use the cleaner mapply:

因此,您可以将cbind(filelist [[i]],ID [[i]])放入for循环中,或者@Sven指出,您可以使用更清洁的mapply:

filelist <- mapply(cbind, filelist, "SampleID"=ID, SIMPLIFY=F)

#2


16  

This is a corrected version of your loop:

这是你的循环的更正版本:

for( i in seq_along(filelist)){

  filelist[[i]]$SampleID <- rep(ID[i],nrow(filelist[[i]]))

}

There were 3 problems:

有3个问题:

  • A final ) was missing after the command in the body.
  • 在身体命令后,最后一个)失踪了。
  • Elements of lists are accessed by [[, not by [. [ returns a list of length one. [[ returns the element only.
  • 列表元素由[[,而不是[。]访问。 [返回长度为一的列表。 [[仅返回元素。
  • length(filelist) is just one value, so the loop runs for the last element of the list only. I replaced it with seq_along(filelist).
  • length(filelist)只是一个值,因此循环仅针对列表的最后一个元素运行。我用seq_along(filelist)替换它。

A more efficient approach is to use mapply for the task:

更有效的方法是使用mapply完成任务:

mapply(function(x, y) "[<-"(x, "SampleID", value = y) ,
       filelist, ID, SIMPLIFY = FALSE)

#3


2  

A tricky way:

一个棘手的方法:

library(plyr)

names(filelist) <- ID
result <- ldply(filelist, data.frame)

#4


0  

This one worked for me:

这个对我有用:

Create a new column for every dataframe in a list; fill the values of the new column based on existing column. (In your case IDs).

为列表中的每个数据框创建一个新列;根据现有列填充新列的值。 (在你的情况下ID)。

Example:

例:

# Create dummy data
df1<-data.frame(a = c(1,2,3))
df2<-data.frame(a = c(5,6,7))

# Create a list
l<-list(df1, df2)

> l
[[1]]
  a
1 1
2 2
3 3

[[2]]
  a
1 5
2 6
3 7

# add new column 'b'
# create 'b' values based on column 'a' 
l2<-lapply(l, function(x) 
  cbind(x, b = x$a*4))

Results in:

结果是:

> l2
[[1]]
  a  b
1 1  4
2 2  8
3 3 12

[[2]]
  a  b
1 5 20
2 6 24
3 7 28

In your case something like:

在你的情况下,像:

filelist<-lapply(filelist, function(x) 
  cbind(x, b = x$SampleID))