I would like to save a whole bunch of relatively large data frames while minimizing the space that the files take up. When opening the files, I need to be able to control what names they are given in the workspace.
我想保存一大堆相对较大的数据帧,同时最大限度地减少文件占用的空间。打开文件时,我需要能够控制工作区中给出的名称。
Basically I'm looking for the symantics of dput and dget but with binary files.
基本上我正在寻找dput和dget的语义,但使用二进制文件。
Example:
例:
n<-10000
for(i in 1:100){
dat<-data.frame(a=rep(c("Item 1","Item 2"),n/2),b=rnorm(n),
c=rnorm(n),d=rnorm(n),e=rnorm(n))
dput(dat,paste("data",i,sep=""))
}
##much later
##extract 3 random data sets and bind them
for(i in 1:10){
nums<-sample(1:100,3)
comb<-rbind(dget(paste("data",nums[1],sep="")),
dget(paste("data",nums[2],sep="")),
dget(paste("data",nums[3],sep="")))
##do stuff here
}
2 个解决方案
#1
21
Your best bet is to use rda files. You can use the save()
and load()
commands to write and read:
你最好的选择是使用rda文件。您可以使用save()和load()命令进行写入和读取:
set.seed(101)
a = data.frame(x1=runif(10), x2=runif(10), x3=runif(10))
save(a, file="test.rda")
load("test.rda")
Edit: For completeness, just to cover what Harlan's suggestion might look like (i.e. wrapping the load command to return the data frame):
编辑:为了完整性,只是为了涵盖Harlan的建议可能是什么样的(即包装load命令以返回数据框):
loadx <- function(x, file) {
load(file)
return(x)
}
loadx(a, "test.rda")
Alternatively, have a look at the hdf5, RNetCDF and ncdf packages. I've experimented with the hdf5 package in the past; this uses the NCSA HDF5 library. It's very simple:
或者,看看hdf5,RNetCDF和ncdf包。我过去曾尝试过hdf5软件包;这使用NCSA HDF5库。这很简单:
hdf5save(fileout, ...)
hdf5load(file, load = TRUE, verbosity = 0, tidy = FALSE)
A last option is to use binary file connections, but that won't work well in your case because readBin and writeBin only support vectors:
最后一个选项是使用二进制文件连接,但在您的情况下这不会很好,因为readBin和writeBin仅支持向量:
Here's a trivial example. First write some data with "w" and append "b" to the connection:
这是一个简单的例子。首先使用“w”写入一些数据并将“b”附加到连接:
zz <- file("testbin", "wb")
writeBin(1:10, zz)
close(zz)
Then read the data with "r" and append "b" to the connection:
然后用“r”读取数据并在连接中附加“b”:
zz <- file("testbin", "rb")
readBin(zz, integer(), 4)
close(zz)
#2
12
You may have a look at saveRDS
and readRDS
. They are functions for serialization.
您可以查看saveRDS和readRDS。它们是序列化的功能。
x = data.frame(x1=runif(10), x2=runif(10), x3=runif(10))
saveRDS(x, file="myDataFile.rds")
x <- readRDS(file="myDataFile.rds")
#1
21
Your best bet is to use rda files. You can use the save()
and load()
commands to write and read:
你最好的选择是使用rda文件。您可以使用save()和load()命令进行写入和读取:
set.seed(101)
a = data.frame(x1=runif(10), x2=runif(10), x3=runif(10))
save(a, file="test.rda")
load("test.rda")
Edit: For completeness, just to cover what Harlan's suggestion might look like (i.e. wrapping the load command to return the data frame):
编辑:为了完整性,只是为了涵盖Harlan的建议可能是什么样的(即包装load命令以返回数据框):
loadx <- function(x, file) {
load(file)
return(x)
}
loadx(a, "test.rda")
Alternatively, have a look at the hdf5, RNetCDF and ncdf packages. I've experimented with the hdf5 package in the past; this uses the NCSA HDF5 library. It's very simple:
或者,看看hdf5,RNetCDF和ncdf包。我过去曾尝试过hdf5软件包;这使用NCSA HDF5库。这很简单:
hdf5save(fileout, ...)
hdf5load(file, load = TRUE, verbosity = 0, tidy = FALSE)
A last option is to use binary file connections, but that won't work well in your case because readBin and writeBin only support vectors:
最后一个选项是使用二进制文件连接,但在您的情况下这不会很好,因为readBin和writeBin仅支持向量:
Here's a trivial example. First write some data with "w" and append "b" to the connection:
这是一个简单的例子。首先使用“w”写入一些数据并将“b”附加到连接:
zz <- file("testbin", "wb")
writeBin(1:10, zz)
close(zz)
Then read the data with "r" and append "b" to the connection:
然后用“r”读取数据并在连接中附加“b”:
zz <- file("testbin", "rb")
readBin(zz, integer(), 4)
close(zz)
#2
12
You may have a look at saveRDS
and readRDS
. They are functions for serialization.
您可以查看saveRDS和readRDS。它们是序列化的功能。
x = data.frame(x1=runif(10), x2=runif(10), x3=runif(10))
saveRDS(x, file="myDataFile.rds")
x <- readRDS(file="myDataFile.rds")