I am trying to merge
several data.frames
into one data.frame
. Since I have a whole list of files I am trying to do it with a loop structure.
我正在尝试将几个data.frame合并到一个data.frame中。由于我有一个完整的文件列表,所以我尝试使用循环结构来实现它。
So far the loop approach works fine. However, it looks pretty inefficient and I am wondering if there is a faster and easier approach.
到目前为止,循环方法运行良好。然而,它看起来非常低效,我想知道是否有一种更快更容易的方法。
Here is the scenario: I have a directory with several .csv
files. Each file contains the same identifier which can be used as the merger variable. Since the files are rather large in size I thought to read each file one at a time into R instead of reading all files at once. So I get all the files of the directory with list.files
and read in the first two files. Afterwards I use merge
to get one data.frame
.
这里的场景是:我有一个包含几个.csv文件的目录。每个文件包含可以用作合并变量的相同标识符。由于文件的大小相当大,我认为应该一次读取一个文件到R中,而不是一次读取所有文件。所以我得到了目录列表的所有文件。文件和读取前两个文件。然后我使用merge来获取一个data.frame。
FileNames <- list.files(path=".../tempDataFolder/")
FirstFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[1], sep=""),
header=T, na.strings="NULL")
SecondFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[2], sep=""),
header=T, na.strings="NULL")
dataMerge <- merge(FirstFile, SecondFile, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),
all=T)
Now I use a for
loop to get all the remaining .csv
files and merge
them into the already existing data.frame
:
现在我使用for循环来获取所有剩余的.csv文件,并将它们合并到现有的data.frame:
for(i in 3:length(FileNames)){
ReadInMerge <- read.csv(file=paste(".../tempDataFolder/", FileNames[i], sep=""),
header=T, na.strings="NULL")
dataMerge <- merge(dataMerge, ReadInMerge, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),
all=T)
}
Even though it works just fine I was wondering if there is a more elegant way to get the job done?
即使它工作的很好,我想知道是否有一种更优雅的方式来完成工作?
2 个解决方案
#1
37
You may want to look at the closely related question on *.
您可能想看看与*密切相关的问题。
I would approach this in two steps: import all the data (with plyr
), then merge it together:
我将通过两个步骤来实现这一点:导入所有数据(使用plyr),然后合并到一起:
filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
library(plyr)
import.list <- llply(filenames, read.csv)
That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce
):
这将为您提供现在需要合并到一起的所有文件的列表。有很多方法可以做到这一点,但这里有一个方法(减少):
data <- Reduce(function(x, y) merge(x, y, all=T,
by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F)
Alternatively, you can do this with the reshape
package if you aren't comfortable with Reduce
:
另外,如果您不喜欢Reduce,也可以使用这个重塑包:
library(reshape)
data <- merge_recurse(import.list)
#2
1
If I'm not mistaken, a pretty simple change could eliminate the 3:length(FileNames)
kludge:
如果我没弄错的话,一个非常简单的更改就可以消除3:length(文件名)kludge:
FileNames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
dataMerge <- data.frame()
for(f in FileNames){
ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")
dataMerge <- merge(dataMerge, ReadInMerge,
by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T)
}
#1
37
You may want to look at the closely related question on *.
您可能想看看与*密切相关的问题。
I would approach this in two steps: import all the data (with plyr
), then merge it together:
我将通过两个步骤来实现这一点:导入所有数据(使用plyr),然后合并到一起:
filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
library(plyr)
import.list <- llply(filenames, read.csv)
That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce
):
这将为您提供现在需要合并到一起的所有文件的列表。有很多方法可以做到这一点,但这里有一个方法(减少):
data <- Reduce(function(x, y) merge(x, y, all=T,
by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F)
Alternatively, you can do this with the reshape
package if you aren't comfortable with Reduce
:
另外,如果您不喜欢Reduce,也可以使用这个重塑包:
library(reshape)
data <- merge_recurse(import.list)
#2
1
If I'm not mistaken, a pretty simple change could eliminate the 3:length(FileNames)
kludge:
如果我没弄错的话,一个非常简单的更改就可以消除3:length(文件名)kludge:
FileNames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
dataMerge <- data.frame()
for(f in FileNames){
ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")
dataMerge <- merge(dataMerge, ReadInMerge,
by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T)
}