I'm attempting to import and export, in pieces, a single 10GB CSV file with roughly 10 million observations. I want about 10 manageable RData files in the end (data_1.RData
, data_2.Rdata
, etc.), but I'm having trouble making the skip
and nrows
dynamic. My nrows
will never change as I need almost 1 million per dataset, but I'm thinking I'll need some equation for skip=
so that every loop it increases to catch the next 1 million rows. Also, having header=T
might mess up anything over ii=1
since only the first row will include variable names. The following is the bulk of the code I'm working with:
我试图导入和导出一个10GB的CSV文件,大约有1000万个观测值。我最终想要10个可管理的RData文件(data_1.RData,data_2.Rdata等),但是我在制作skip和nrows动态时遇到了麻烦。我的nrows永远不会改变,因为我每个数据集需要近100万个,但我想我需要一些skip =的等式,以便每个循环增加以捕获下一个100万行。此外,如果只有第一行包含变量名,那么拥有header = T可能会使ii = 1上的任何内容搞乱。以下是我正在使用的大部分代码:
for (ii in 1:10){
data <- read.csv("myfolder/file.csv",
row.names=NULL, header=T, sep=",", stringsAsFactors=F,
skip=0, nrows=1000000)
outName <- paste("data",ii,sep="_")
save(data,file=file.path(outPath,paste(outName,".RData",sep="")))
}
1 个解决方案
#1
1
(Untested but...) You can try something like this:
(未经测试,但......)您可以尝试这样的事情:
nrows <- 1000000
ind <- c(0, seq(from = nrows, length.out = 10, by = nrows) + 1)
header <- names(read.csv("myfolder/file.csv", header = TRUE, nrows = 1))
for (i in seq_along(ind)) {
data <- read.csv("myfolder/file.csv",
row.names = NULL, header = FALSE,
sep = ",", stringsAsFactors = FALSE,
skip = ind[i], nrows = 1000000)
names(data) <- header
outName <- paste("data", ii, sep = "_")
save(data, file = file.path(outPath, paste(outName, ".RData", sep = "")))
}
#1
1
(Untested but...) You can try something like this:
(未经测试,但......)您可以尝试这样的事情:
nrows <- 1000000
ind <- c(0, seq(from = nrows, length.out = 10, by = nrows) + 1)
header <- names(read.csv("myfolder/file.csv", header = TRUE, nrows = 1))
for (i in seq_along(ind)) {
data <- read.csv("myfolder/file.csv",
row.names = NULL, header = FALSE,
sep = ",", stringsAsFactors = FALSE,
skip = ind[i], nrows = 1000000)
names(data) <- header
outName <- paste("data", ii, sep = "_")
save(data, file = file.path(outPath, paste(outName, ".RData", sep = "")))
}