从Rdata文件获取特定对象

时间:2022-03-27 04:48:23

I have a Rdata file containing various objects:

我有一个包含各种对象的Rdata文件:

 New.Rdata
  |_ Object 1  (e.g. data.frame)
  |_ Object 2  (e.g. matrix)
  |_...
  |_ Object n

Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?

当然,我可以用load('New.Rdata')加载数据帧,但是,是否有一种聪明的方法可以只从这个文件中加载一个特定的对象,而丢弃其他的对象?

3 个解决方案

#1


61  

.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.

. rdata文件没有索引(内容被序列化为一个大的成对列表)。您可以通过hack来查看pairlist并只分配您喜欢的条目,但是这并不容易,因为您不能在R级别上执行它。

However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:

但是,您可以简单地将. rdata文件转换为一个延迟加载的数据库,该数据库分别序列化每个条目并创建一个索引。好消息是,加载将按需进行:

# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")

Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:

然后,加载DB只加载索引,而不加载内容。内容在使用时载入:

lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb

Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.

就像load()一样,您可以指定要加载到的环境,这样就不需要污染全局工作区等等。

#2


12  

You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.

可以使用attach而不是load将数据对象附加到搜索路径,然后可以复制感兴趣的一个对象并分离. rdata对象。

This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.

这仍然可以加载所有内容,但是与将所有内容加载到全局工作区(可能覆盖您不想覆盖的内容)相比,处理所有您不想要的内容更简单。

#3


4  

Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:

Simon Urbanek的回答非常非常好。一个缺点是,如果要保存的对象太大,它似乎不起作用:

tools:::makeLazyLoadDB(
  local({
    x <- 1:1e+09
   cat("size:", object.size(x) ,"\n")
   environment()
  }), "lazytest")
size: 4e+09 
Error: serialization is too large to store in a raw vector

I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.

我猜想这是由于当前R(我有2.15.2)实现的限制,而不是耗尽物理内存和交换。不过,save包可能是某些用途的替代方案。

#1


61  

.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.

. rdata文件没有索引(内容被序列化为一个大的成对列表)。您可以通过hack来查看pairlist并只分配您喜欢的条目,但是这并不容易,因为您不能在R级别上执行它。

However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:

但是,您可以简单地将. rdata文件转换为一个延迟加载的数据库,该数据库分别序列化每个条目并创建一个索引。好消息是,加载将按需进行:

# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")

Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:

然后,加载DB只加载索引,而不加载内容。内容在使用时载入:

lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb

Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.

就像load()一样,您可以指定要加载到的环境,这样就不需要污染全局工作区等等。

#2


12  

You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.

可以使用attach而不是load将数据对象附加到搜索路径,然后可以复制感兴趣的一个对象并分离. rdata对象。

This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.

这仍然可以加载所有内容,但是与将所有内容加载到全局工作区(可能覆盖您不想覆盖的内容)相比,处理所有您不想要的内容更简单。

#3


4  

Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:

Simon Urbanek的回答非常非常好。一个缺点是,如果要保存的对象太大,它似乎不起作用:

tools:::makeLazyLoadDB(
  local({
    x <- 1:1e+09
   cat("size:", object.size(x) ,"\n")
   environment()
  }), "lazytest")
size: 4e+09 
Error: serialization is too large to store in a raw vector

I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.

我猜想这是由于当前R(我有2.15.2)实现的限制,而不是耗尽物理内存和交换。不过,save包可能是某些用途的替代方案。