saveRDS()和readRDS()关于对象属性的行为

时间:2021-10-21 05:48:31

Do saveRDS and readRDS, correspondingly, save and restore all object's attributes, including ones created by an application (via attr)? I tried to use this approach instead of save and load, in an attempt to find a workaround for my problem linked below. However, it doesn't seem to be the case, unless I'm doing something wrong: Can I access R data objects' attributes without fully loading objects from file?.

相应地,saveRDS和readRDS是否保存和恢复所有对象的属性,包括应用程序创建的属性(通过attr)?我试图使用这种方法而不是保存和加载,试图找到下面链接的问题的解决方法。但是,似乎并非如此,除非我做错了什么:我可以访问R数据对象的属性而无需从文件中完全加载对象吗?

2 个解决方案

#1


13  

Yes, they do:

是的,他们这样做:

test <- structure(1:10, names=LETTERS[1:10], color='red', xxx='yyy')
attr(test, which='uuu') <- 'zzz'
test
##  A  B  C  D  E  F  G  H  I  J 
##  1  2  3  4  5  6  7  8  9 10 
## attr(,"color")
## [1] "red"
## attr(,"xxx")
## [1] "yyy"
## attr(,"uuu")
## [1] "zzz"
saveRDS(test, '/tmp/test.rds')
test2 <- readRDS('/tmp/test.rds')
identical(test, test2)
## [1] TRUE

R relies heavily on these functions (as well as their variants). For example, they are used to save the user's workspace. Thus, it would be odd if they hadn't stored the attributes.

R在很大程度上依赖于这些功能(以及它们的变体)。例如,它们用于保存用户的工作区。因此,如果他们没有存储属性,那将是奇怪的。

However, do note that you cannot store some "dynamically created" objects with these. This includes file and SQL db connections handlers, temporary SQL result handlers, etc. An example with RCpp compiled functions:

但请注意,您无法使用这些对象存储某些“动态创建”的对象。这包括文件和SQL db连接处理程序,临时SQL结果处理程序等.RCpp编译函数的示例:

library('Rcpp')
library('inline')   
cppFunction("int one() { return 1; }")
one() # it works
## [1] 1
one # contains a pointer to dynamically allocated mem chunk
## function () 
## .Primitive(".Call")(<pointer: 0x7f52c33a7680>)
saveRDS(one, '/tmp/one.rds')

Now we restart R...

现在我们重启R ...

one <- readRDS('/tmp/one.rds')
one # the pointer is no longer valid
## function () 
## .Primitive(".Call")(<pointer: (nil)>)
one() # doesn't work
## Error in .Primitive(".Call")(<pointer: (nil)>) : 
##  NULL value passed as symbol address

#2


5  

saveRDS() provides a far better solution to the problem and to the general one of saving and loading objects created with R. saveRDS() serializes an R object into a format that can be saved. Wikipedia describes this thus

saveRDS()提供了一个更好的解决问题的方法,以及保存和加载用R创建的对象的一般方法.saveRDS()将R对象序列化为可以保存的格式。*因此描述了这一点

…serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment.

...序列化是将数据结构或对象状态转换为可以存储的格式(例如,在文件或内存缓冲区中,或通过网络连接链接传输)并稍后在同一计算机或另一台计算机中“复活”的过程环境。

save() does the same thing, but with one important difference: saveRDS() doesn’t save both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized.

save()做同样的事情,但有一个重要区别:saveRDS()不保存对象及其名称,它只保存对象的表示。因此,保存的对象可以加载到R中的命名对象中,该对象与最初序列化时的名称不同。

We can illustrate this using the below model

我们可以使用下面的模型来说明这一点

 ls() [1] "mod"
 saveRDS(mod, "mymodel.rds")
 mod2 <- readRDS("mymodel.rds")
 ls() [1] "mod"  "mod2"
 identical(mod, mod2, ignore.environment = TRUE) [1] TRUE

(Note that the two model objects have different environments within their representations so we have to ignore this when testing their identity.)

(请注意,两个模型对象在其表示中具有不同的环境,因此在测试其身份时我们必须忽略它。)

You’ll notice that in the call to saveRDS() I named the file with the extension .rds. This appears to be the convention used for serialized object of this sort; R uses this representation often, for example package meta-data and the databases used by help.search(). In contrast the extension .rda is often used for objects serialized via save().

您会注意到在saveRDS()的调用中,我使用扩展名.rds命名了该文件。这似乎是用于此类序列化对象的约定; R经常使用此表示形式,例如包元数据和help.search()使用的数据库。相比之下,扩展名.rda通常用于通过save()序列化的对象。

So there you have it; saveRDS() and readRDS() are the newest additions to my day-to-day workflow.

所以你有它; saveRDS()和readRDS()是我日常工作流程的最新成员。

Note: saveRDS() isn’t a drop-in replacement for save(). The main difference is that save() can save many objects to a file in a single call, whilst saveRDS(), being a lower-level function, works with a single object at a time. This is a feature for me given the above use-case, but if you find yourself saving any more than a couple of objects at a time saveRDS() may not be ideal for you. The second significant difference is that saveRDS() forgets the original name of the object; in the use-case above this is also seen as an advantage. If maintaining the original name is important to you then there is no reason to switch from using save() to saveRDS().

注意:saveRDS()不是save()的替代品。主要区别在于save()可以在一次调用中将许多对象保存到文件中,而saveRDS()是一个较低级别的函数,一次只能处理一个对象。鉴于上述用例,这对我来说是一个功能,但是如果你发现自己一次保存了几个以上的对象,那么saveRDS()可能并不适合你。第二个显着区别是saveRDS()忘记了对象的原始名称;在上面的用例中,这也被视为一个优点。如果保持原始名称对您很重要,则没有理由从使用save()切换到saveRDS()。

#1


13  

Yes, they do:

是的,他们这样做:

test <- structure(1:10, names=LETTERS[1:10], color='red', xxx='yyy')
attr(test, which='uuu') <- 'zzz'
test
##  A  B  C  D  E  F  G  H  I  J 
##  1  2  3  4  5  6  7  8  9 10 
## attr(,"color")
## [1] "red"
## attr(,"xxx")
## [1] "yyy"
## attr(,"uuu")
## [1] "zzz"
saveRDS(test, '/tmp/test.rds')
test2 <- readRDS('/tmp/test.rds')
identical(test, test2)
## [1] TRUE

R relies heavily on these functions (as well as their variants). For example, they are used to save the user's workspace. Thus, it would be odd if they hadn't stored the attributes.

R在很大程度上依赖于这些功能(以及它们的变体)。例如,它们用于保存用户的工作区。因此,如果他们没有存储属性,那将是奇怪的。

However, do note that you cannot store some "dynamically created" objects with these. This includes file and SQL db connections handlers, temporary SQL result handlers, etc. An example with RCpp compiled functions:

但请注意,您无法使用这些对象存储某些“动态创建”的对象。这包括文件和SQL db连接处理程序,临时SQL结果处理程序等.RCpp编译函数的示例:

library('Rcpp')
library('inline')   
cppFunction("int one() { return 1; }")
one() # it works
## [1] 1
one # contains a pointer to dynamically allocated mem chunk
## function () 
## .Primitive(".Call")(<pointer: 0x7f52c33a7680>)
saveRDS(one, '/tmp/one.rds')

Now we restart R...

现在我们重启R ...

one <- readRDS('/tmp/one.rds')
one # the pointer is no longer valid
## function () 
## .Primitive(".Call")(<pointer: (nil)>)
one() # doesn't work
## Error in .Primitive(".Call")(<pointer: (nil)>) : 
##  NULL value passed as symbol address

#2


5  

saveRDS() provides a far better solution to the problem and to the general one of saving and loading objects created with R. saveRDS() serializes an R object into a format that can be saved. Wikipedia describes this thus

saveRDS()提供了一个更好的解决问题的方法,以及保存和加载用R创建的对象的一般方法.saveRDS()将R对象序列化为可以保存的格式。*因此描述了这一点

…serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment.

...序列化是将数据结构或对象状态转换为可以存储的格式(例如,在文件或内存缓冲区中,或通过网络连接链接传输)并稍后在同一计算机或另一台计算机中“复活”的过程环境。

save() does the same thing, but with one important difference: saveRDS() doesn’t save both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized.

save()做同样的事情,但有一个重要区别:saveRDS()不保存对象及其名称,它只保存对象的表示。因此,保存的对象可以加载到R中的命名对象中,该对象与最初序列化时的名称不同。

We can illustrate this using the below model

我们可以使用下面的模型来说明这一点

 ls() [1] "mod"
 saveRDS(mod, "mymodel.rds")
 mod2 <- readRDS("mymodel.rds")
 ls() [1] "mod"  "mod2"
 identical(mod, mod2, ignore.environment = TRUE) [1] TRUE

(Note that the two model objects have different environments within their representations so we have to ignore this when testing their identity.)

(请注意,两个模型对象在其表示中具有不同的环境,因此在测试其身份时我们必须忽略它。)

You’ll notice that in the call to saveRDS() I named the file with the extension .rds. This appears to be the convention used for serialized object of this sort; R uses this representation often, for example package meta-data and the databases used by help.search(). In contrast the extension .rda is often used for objects serialized via save().

您会注意到在saveRDS()的调用中,我使用扩展名.rds命名了该文件。这似乎是用于此类序列化对象的约定; R经常使用此表示形式,例如包元数据和help.search()使用的数据库。相比之下,扩展名.rda通常用于通过save()序列化的对象。

So there you have it; saveRDS() and readRDS() are the newest additions to my day-to-day workflow.

所以你有它; saveRDS()和readRDS()是我日常工作流程的最新成员。

Note: saveRDS() isn’t a drop-in replacement for save(). The main difference is that save() can save many objects to a file in a single call, whilst saveRDS(), being a lower-level function, works with a single object at a time. This is a feature for me given the above use-case, but if you find yourself saving any more than a couple of objects at a time saveRDS() may not be ideal for you. The second significant difference is that saveRDS() forgets the original name of the object; in the use-case above this is also seen as an advantage. If maintaining the original name is important to you then there is no reason to switch from using save() to saveRDS().

注意:saveRDS()不是save()的替代品。主要区别在于save()可以在一次调用中将许多对象保存到文件中,而saveRDS()是一个较低级别的函数,一次只能处理一个对象。鉴于上述用例,这对我来说是一个功能,但是如果你发现自己一次保存了几个以上的对象,那么saveRDS()可能并不适合你。第二个显着区别是saveRDS()忘记了对象的原始名称;在上面的用例中,这也被视为一个优点。如果保持原始名称对您很重要,则没有理由从使用save()切换到saveRDS()。