从Rdata文件中获取特定对象

时间:2022-03-27 04:53:41

I have a Rdata file containing various objects:

我有一个包含各种对象的Rdata文件:

 New.Rdata  |_ Object 1  (e.g. data.frame)  |_ Object 2  (e.g. matrix)  |_...  |_ Object n

Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?

当然我可以使用load('New.Rdata')加载数据框,但是,是否有一种智能方法只能从该文件中加载一个特定对象并丢弃其他对象?

3 个解决方案

#1


61  

.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.

.RData文件没有索引(内容被序列化为一个大型的pairlist)。你可以通过一种方式来完成通过pairlist并只分配你喜欢的条目,但这并不容易,因为你不能在R级别做。

However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:

但是,您可以简单地将.RData文件转换为延迟加载数据库,该数据库分别序列化每个条目并创建索引。好处是加载将是按需的:

# convert .RData -> .rdb/.rdxe = local({load("New.RData"); environment()})tools:::makeLazyLoadDB(e, "New")

Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:

加载数据库然后只加载索引而不加载内容。内容在使用时加载:

lazyLoad("New")ls()x # if you had x in the New.RData it will be fetched now from New.rdb

Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.

就像使用load()一样,您可以指定要加载的环境,这样您就不需要污染全局工作区等。

#2


14  

You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.

您可以使用attach而不是load将数据对象附加到搜索路径,然后您可以复制您感兴趣的一个对象并分离.Rdata对象。

This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.

这仍然可以加载所有内容,但是比将所有内容加载到全局工作区(可能会覆盖您不想覆盖的内容)然后删除您不想要的所有内容更简单。

#3


4  

Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:

Simon Urbanek的回答非常非常好。缺点是如果要保存的对象太大,它似乎不起作用:

tools:::makeLazyLoadDB(  local({    x <- 1:1e+09   cat("size:", object.size(x) ,"\n")   environment()  }), "lazytest")size: 4e+09 Error: serialization is too large to store in a raw vector

I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.

我猜这是由于R的当前实现的限制(我有2.15.2),而不是耗尽物理内存和交换。但是,save包可能是某些用途的替代方案。

#1


61  

.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.

.RData文件没有索引(内容被序列化为一个大型的pairlist)。你可以通过一种方式来完成通过pairlist并只分配你喜欢的条目,但这并不容易,因为你不能在R级别做。

However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:

但是,您可以简单地将.RData文件转换为延迟加载数据库,该数据库分别序列化每个条目并创建索引。好处是加载将是按需的:

# convert .RData -> .rdb/.rdxe = local({load("New.RData"); environment()})tools:::makeLazyLoadDB(e, "New")

Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:

加载数据库然后只加载索引而不加载内容。内容在使用时加载:

lazyLoad("New")ls()x # if you had x in the New.RData it will be fetched now from New.rdb

Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.

就像使用load()一样,您可以指定要加载的环境,这样您就不需要污染全局工作区等。

#2


14  

You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.

您可以使用attach而不是load将数据对象附加到搜索路径,然后您可以复制您感兴趣的一个对象并分离.Rdata对象。

This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.

这仍然可以加载所有内容,但是比将所有内容加载到全局工作区(可能会覆盖您不想覆盖的内容)然后删除您不想要的所有内容更简单。

#3


4  

Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:

Simon Urbanek的回答非常非常好。缺点是如果要保存的对象太大,它似乎不起作用:

tools:::makeLazyLoadDB(  local({    x <- 1:1e+09   cat("size:", object.size(x) ,"\n")   environment()  }), "lazytest")size: 4e+09 Error: serialization is too large to store in a raw vector

I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.

我猜这是由于R的当前实现的限制(我有2.15.2),而不是耗尽物理内存和交换。但是,save包可能是某些用途的替代方案。