I have two directories, each with many files in them. The files in each directory have the same name. What I'd like to do is apply a function (for instance a correlation, and extract the estimate) on dir1/file1 to dir2/file1, repeat this over all files which match in name, and store the result as a data frame.
我有两个目录,每个目录中都有很多文件。每个目录中的文件具有相同的名称。我想要做的是将dir1 / file1上的函数(例如关联,并提取估计值)应用于dir2 / file1,对名称中匹配的所有文件重复此操作,并将结果存储为数据框。
I'm trying something like this:
我正在尝试这样的事情:
f1 = list.files("path1", "*abc.csv")
f2 = list.files("path2", "*abc.csv")
for (i in 1:length(f1)) {
tmp <- as.matrix(read.csv(f1[i], header=FALSE))
tmp2 <- as.matrix(read.csv(f2[i], header=FALSE))
c = cor.test(tmp,tmp2)
lst[[f1[i]]] <- c$estimate
}
But I'm a little stuck due to the matching filenames and also thinking that apply
plus a match
call might be a better choice. I've searched and found solutions on dealing with importing and applying a function to multiple files, but not when importing two batches and the files have identical names.
但由于匹配的文件名,我觉得有点困难,并且认为应用加上匹配调用可能是更好的选择。我已经搜索并找到了处理导入和将函数应用于多个文件的解决方案,但是在导入两个批次并且文件具有相同名称时却没有。
2 个解决方案
#1
2
I think you could do something like this:
我想你可以这样做:
get.cor <- function(name, path1 = "path1", path2 = "path2") {
f1 <- paste0(path1, name)
f2 <- paste0(path2, name)
m1 <- as.matrix(read.csv(f1, header = TRUE))
m2 <- as.matrix(read.csv(f2, header = TRUE))
cor.test(m1, m2)$estimate
}
# Some toy folders and data
system("mkdir tmpfolder")
system("mkdir tmpfolder2")
set.seed(123)
m1 <- matrix(rnorm(100), nrow=10)
m2 <- matrix(rnorm(100), nrow=10)
cor.test(m1, m2)$estimate
#> cor
#> -0.04953215
write.csv(m1, "tmpfolder/f1.csv", row.names = F)
write.csv(m2, "tmpfolder2/f1.csv", row.names = F)
# since names are identical one list of names will suffice
f.names <- list.files("tmpfolder/")
# now apply the function to each file name
lapply(f.names, function(n){get.cor(n, path1 = "tmpfolder/", path2 = "tmpfolder2/")})
#> [[1]]
#> cor
#> -0.04953215
#2
0
I would first read all files as matrices, then get all correlations using mapply
, which is faster and neater.
我首先将所有文件作为矩阵读取,然后使用mapply获得所有相关性,这更快更整洁。
#read file paths
f1 = list.files("path1", "*.csv")
f2 = list.files("path2", "*.csv")
# order the files so they match each other in both lists
f1 = f1[order(f1)]
f2 = f2[order(f2)]
#load them as matrices
f11 = lapply(f1, function(x) as.matrix(read.csv(x))
f22 = lapply(f2, function(x) as.matrix(read.csv(x))
# generate the correlations
cor_tests = mapply(cor.test, f11, f22)
An example with dummy data
虚拟数据的示例
f1 = list(rnorm(100), rnorm(100))
f2 = list(2*rnorm(100), 2*rnorm(100))
ab = mapply(cor.test, f1, f2)
ab[rownames(ab) == "estimate"]
[[1]]
cor
-0.1024785
[[2]]
cor
0.1020779
#1
2
I think you could do something like this:
我想你可以这样做:
get.cor <- function(name, path1 = "path1", path2 = "path2") {
f1 <- paste0(path1, name)
f2 <- paste0(path2, name)
m1 <- as.matrix(read.csv(f1, header = TRUE))
m2 <- as.matrix(read.csv(f2, header = TRUE))
cor.test(m1, m2)$estimate
}
# Some toy folders and data
system("mkdir tmpfolder")
system("mkdir tmpfolder2")
set.seed(123)
m1 <- matrix(rnorm(100), nrow=10)
m2 <- matrix(rnorm(100), nrow=10)
cor.test(m1, m2)$estimate
#> cor
#> -0.04953215
write.csv(m1, "tmpfolder/f1.csv", row.names = F)
write.csv(m2, "tmpfolder2/f1.csv", row.names = F)
# since names are identical one list of names will suffice
f.names <- list.files("tmpfolder/")
# now apply the function to each file name
lapply(f.names, function(n){get.cor(n, path1 = "tmpfolder/", path2 = "tmpfolder2/")})
#> [[1]]
#> cor
#> -0.04953215
#2
0
I would first read all files as matrices, then get all correlations using mapply
, which is faster and neater.
我首先将所有文件作为矩阵读取,然后使用mapply获得所有相关性,这更快更整洁。
#read file paths
f1 = list.files("path1", "*.csv")
f2 = list.files("path2", "*.csv")
# order the files so they match each other in both lists
f1 = f1[order(f1)]
f2 = f2[order(f2)]
#load them as matrices
f11 = lapply(f1, function(x) as.matrix(read.csv(x))
f22 = lapply(f2, function(x) as.matrix(read.csv(x))
# generate the correlations
cor_tests = mapply(cor.test, f11, f22)
An example with dummy data
虚拟数据的示例
f1 = list(rnorm(100), rnorm(100))
f2 = list(2*rnorm(100), 2*rnorm(100))
ab = mapply(cor.test, f1, f2)
ab[rownames(ab) == "estimate"]
[[1]]
cor
-0.1024785
[[2]]
cor
0.1020779