I have a list of files. I also have a list of "names" which I substr()
from the actual filenames of these files. I would like to add a new column to each of the files in the list. This column will contain the corresponding element in "names" repeated times the number of rows in the file.
我有一个文件列表。我还有一个“名称”列表,我从这些文件的实际文件名中输入了substr()。我想为列表中的每个文件添加一个新列。此列将包含“names”中相应元素,重复次数为文件中的行数。
For example:
例如:
df1 <- data.frame(x = 1:3, y=letters[1:3])
df2 <- data.frame(x = 4:6, y=letters[4:6])
filelist <- list(df1,df2)
ID <- c("1A","IB")
Pseudocode
伪代码
for( i in length(filelist)){
filelist[i]$SampleID <- rep(ID[i],nrow(filelist[i])
}
// basically create a new column in each of the dataframes in filelist, and fill the column with repeted corresponding values of ID
//基本上在filelist的每个数据框中创建一个新列,并用重复的相应ID值填充该列
my output should be like:
我的输出应该是这样的:
filelist[1]
should be:
filelist [1]应该是:
x y SAmpleID
1 1 a 1A
2 2 b 1A
3 3 c 1A
fileList[2]
的fileList [2]
x y SampleID
1 4 d IB
2 5 e IB
3 6 f IB
and so on.....
等等.....
Any Idea how it could be done.
任何想法如何做到这一点。
4 个解决方案
#1
29
An alternate solution is to use cbind, and taking advantage of the fact that R will recylce values of a shorter vector.
另一种解决方案是使用cbind,并利用R将重新定义较短矢量值的事实。
For Example
例如
x <- df2 # from above
cbind(x, NewColumn="Singleton")
# x y NewColumn
# 1 4 d Singleton
# 2 5 e Singleton
# 3 6 f Singleton
There is no need for the use of rep
. R does that for you.
没有必要使用代表。 R为你做到了。
Therfore, you could put cbind(filelist[[i]], ID[[i]])
in your for loop
or as @Sven pointed out, you can use the cleaner mapply
:
因此,您可以将cbind(filelist [[i]],ID [[i]])放入for循环中,或者@Sven指出,您可以使用更清洁的mapply:
filelist <- mapply(cbind, filelist, "SampleID"=ID, SIMPLIFY=F)
#2
16
This is a corrected version of your loop:
这是你的循环的更正版本:
for( i in seq_along(filelist)){
filelist[[i]]$SampleID <- rep(ID[i],nrow(filelist[[i]]))
}
There were 3 problems:
有3个问题:
- A final
)
was missing after the command in the body. - 在身体命令后,最后一个)失踪了。
- Elements of lists are accessed by
[[
, not by[
.[
returns a list of length one.[[
returns the element only. - 列表元素由[[,而不是[。]访问。 [返回长度为一的列表。 [[仅返回元素。
-
length(filelist)
is just one value, so the loop runs for the last element of the list only. I replaced it withseq_along(filelist)
. - length(filelist)只是一个值,因此循环仅针对列表的最后一个元素运行。我用seq_along(filelist)替换它。
A more efficient approach is to use mapply
for the task:
更有效的方法是使用mapply完成任务:
mapply(function(x, y) "[<-"(x, "SampleID", value = y) ,
filelist, ID, SIMPLIFY = FALSE)
#3
2
A tricky way:
一个棘手的方法:
library(plyr)
names(filelist) <- ID
result <- ldply(filelist, data.frame)
#4
0
This one worked for me:
这个对我有用:
Create a new column for every dataframe in a list; fill the values of the new column based on existing column. (In your case IDs).
为列表中的每个数据框创建一个新列;根据现有列填充新列的值。 (在你的情况下ID)。
Example:
例:
# Create dummy data
df1<-data.frame(a = c(1,2,3))
df2<-data.frame(a = c(5,6,7))
# Create a list
l<-list(df1, df2)
> l
[[1]]
a
1 1
2 2
3 3
[[2]]
a
1 5
2 6
3 7
# add new column 'b'
# create 'b' values based on column 'a'
l2<-lapply(l, function(x)
cbind(x, b = x$a*4))
Results in:
结果是:
> l2
[[1]]
a b
1 1 4
2 2 8
3 3 12
[[2]]
a b
1 5 20
2 6 24
3 7 28
In your case something like:
在你的情况下,像:
filelist<-lapply(filelist, function(x)
cbind(x, b = x$SampleID))
#1
29
An alternate solution is to use cbind, and taking advantage of the fact that R will recylce values of a shorter vector.
另一种解决方案是使用cbind,并利用R将重新定义较短矢量值的事实。
For Example
例如
x <- df2 # from above
cbind(x, NewColumn="Singleton")
# x y NewColumn
# 1 4 d Singleton
# 2 5 e Singleton
# 3 6 f Singleton
There is no need for the use of rep
. R does that for you.
没有必要使用代表。 R为你做到了。
Therfore, you could put cbind(filelist[[i]], ID[[i]])
in your for loop
or as @Sven pointed out, you can use the cleaner mapply
:
因此,您可以将cbind(filelist [[i]],ID [[i]])放入for循环中,或者@Sven指出,您可以使用更清洁的mapply:
filelist <- mapply(cbind, filelist, "SampleID"=ID, SIMPLIFY=F)
#2
16
This is a corrected version of your loop:
这是你的循环的更正版本:
for( i in seq_along(filelist)){
filelist[[i]]$SampleID <- rep(ID[i],nrow(filelist[[i]]))
}
There were 3 problems:
有3个问题:
- A final
)
was missing after the command in the body. - 在身体命令后,最后一个)失踪了。
- Elements of lists are accessed by
[[
, not by[
.[
returns a list of length one.[[
returns the element only. - 列表元素由[[,而不是[。]访问。 [返回长度为一的列表。 [[仅返回元素。
-
length(filelist)
is just one value, so the loop runs for the last element of the list only. I replaced it withseq_along(filelist)
. - length(filelist)只是一个值,因此循环仅针对列表的最后一个元素运行。我用seq_along(filelist)替换它。
A more efficient approach is to use mapply
for the task:
更有效的方法是使用mapply完成任务:
mapply(function(x, y) "[<-"(x, "SampleID", value = y) ,
filelist, ID, SIMPLIFY = FALSE)
#3
2
A tricky way:
一个棘手的方法:
library(plyr)
names(filelist) <- ID
result <- ldply(filelist, data.frame)
#4
0
This one worked for me:
这个对我有用:
Create a new column for every dataframe in a list; fill the values of the new column based on existing column. (In your case IDs).
为列表中的每个数据框创建一个新列;根据现有列填充新列的值。 (在你的情况下ID)。
Example:
例:
# Create dummy data
df1<-data.frame(a = c(1,2,3))
df2<-data.frame(a = c(5,6,7))
# Create a list
l<-list(df1, df2)
> l
[[1]]
a
1 1
2 2
3 3
[[2]]
a
1 5
2 6
3 7
# add new column 'b'
# create 'b' values based on column 'a'
l2<-lapply(l, function(x)
cbind(x, b = x$a*4))
Results in:
结果是:
> l2
[[1]]
a b
1 1 4
2 2 8
3 3 12
[[2]]
a b
1 5 20
2 6 24
3 7 28
In your case something like:
在你的情况下,像:
filelist<-lapply(filelist, function(x)
cbind(x, b = x$SampleID))