在R中包中的全局变量

时间:2021-10-07 23:09:13

I'm developing a package in R. I have a bunch of functions, some of them need some global variables. How do I manage global variables in packages?

我在r中开发一个包,我有很多函数,有些需要一些全局变量。如何管理包中的全局变量?

I've read something about environment, but I do not understand how it will work, of if this even is the way to go about the things.

我读过一些关于环境的东西,但我不明白它是如何工作的,如果这是一种方法的话。

4 个解决方案

#1


9  

In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.

总的来说,全局变量是有害的。它们之所以有害的基本原则是,您希望最小化包中的互连。这些相互连接常常导致函数产生副作用,即它不仅取决于输入参数,结果是什么,而且还取决于某些全局变量的值。特别是当函数的数量增加时,就很难正确地进行调试。

For global variables in R see this SO post.

对于全局变量,在R中可以看到。

Edit in response to your comment: An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:

编辑回应你的评论:另一种选择可能是把需要的信息传递给需要它的函数。您可以创建一个包含以下信息的新对象:

token_information = list(token1 = "087091287129387",
                         token2 = "UA2329723")

and require all functions that need this information to have it as an argument:

并要求所有需要这些信息的函数将其作为参数:

do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)

In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:

通过这种方式,从代码中可以清楚地看出,函数中需要令牌信息,您可以自己调试函数。此外,函数没有副作用,因为它的行为完全由输入参数决定。一个典型的用户脚本应该如下所示:

token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)

I hope this makes things more clear.

我希望这能让事情变得更清楚。

#2


39  

You can use package local variables through an environment. These variables will be available to multiple functions in the package, but not (easily) accessable to the user and will not interfere with the users workspace. A quick and simple example is:

您可以在环境中使用包本地变量。这些变量将对包中的多个函数可用,但不能(容易地)访问用户,也不会干扰用户工作空间。一个简单的例子是:

pkg.env <- new.env()

pkg.env$cur.val <- 0
pkg.env$times.changed <- 0

inc <- function(by=1) {
    pkg.env$times.changed <- pkg.env$times.changed + 1
    pkg.env$cur.val <- pkg.env$cur.val + by
    pkg.env$cur.val
}

dec <- function(by=1) {
    pkg.env$times.changed <- pkg.env$times.changed + 1
    pkg.env$cur.val <- pkg.env$cur.val - by
    pkg.env$cur.val
}

cur <- function(){
    cat('the current value is', pkg.env$cur.val, 'and it has been changed', 
        pkg.env$times.changed, 'times\n')
}

inc()
inc()
inc(5)
dec()
dec(2)
inc()
cur()

#3


14  

You could set an option, eg

你可以设置一个选项

options("mypkg-myval"=3)
1+getOption("mypkg-myval")
[1] 4

#4


2  

The question is unclear:

问题还不清楚:

  • Just one R process or several?

    只有一个R过程还是几个?

  • Just on one host, or across several machine?

    只是在一台主机上,还是在多台机器上?

  • Is there common file access among them or not?

    它们之间是否有共同的文件访问权限?

In increasing order of complexity, I'd use a file, a SQLite backend via the RSQlite package or (my favourite :) the rredis package to set to / read from a Redis instance.

为了增加复杂度,我使用一个文件,一个SQLite后端通过RSQlite包或者(我最喜欢的:)rredis包从一个Redis实例中设置/读取。

#1


9  

In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.

总的来说,全局变量是有害的。它们之所以有害的基本原则是,您希望最小化包中的互连。这些相互连接常常导致函数产生副作用,即它不仅取决于输入参数,结果是什么,而且还取决于某些全局变量的值。特别是当函数的数量增加时,就很难正确地进行调试。

For global variables in R see this SO post.

对于全局变量,在R中可以看到。

Edit in response to your comment: An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:

编辑回应你的评论:另一种选择可能是把需要的信息传递给需要它的函数。您可以创建一个包含以下信息的新对象:

token_information = list(token1 = "087091287129387",
                         token2 = "UA2329723")

and require all functions that need this information to have it as an argument:

并要求所有需要这些信息的函数将其作为参数:

do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)

In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:

通过这种方式,从代码中可以清楚地看出,函数中需要令牌信息,您可以自己调试函数。此外,函数没有副作用,因为它的行为完全由输入参数决定。一个典型的用户脚本应该如下所示:

token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)

I hope this makes things more clear.

我希望这能让事情变得更清楚。

#2


39  

You can use package local variables through an environment. These variables will be available to multiple functions in the package, but not (easily) accessable to the user and will not interfere with the users workspace. A quick and simple example is:

您可以在环境中使用包本地变量。这些变量将对包中的多个函数可用,但不能(容易地)访问用户,也不会干扰用户工作空间。一个简单的例子是:

pkg.env <- new.env()

pkg.env$cur.val <- 0
pkg.env$times.changed <- 0

inc <- function(by=1) {
    pkg.env$times.changed <- pkg.env$times.changed + 1
    pkg.env$cur.val <- pkg.env$cur.val + by
    pkg.env$cur.val
}

dec <- function(by=1) {
    pkg.env$times.changed <- pkg.env$times.changed + 1
    pkg.env$cur.val <- pkg.env$cur.val - by
    pkg.env$cur.val
}

cur <- function(){
    cat('the current value is', pkg.env$cur.val, 'and it has been changed', 
        pkg.env$times.changed, 'times\n')
}

inc()
inc()
inc(5)
dec()
dec(2)
inc()
cur()

#3


14  

You could set an option, eg

你可以设置一个选项

options("mypkg-myval"=3)
1+getOption("mypkg-myval")
[1] 4

#4


2  

The question is unclear:

问题还不清楚:

  • Just one R process or several?

    只有一个R过程还是几个?

  • Just on one host, or across several machine?

    只是在一台主机上,还是在多台机器上?

  • Is there common file access among them or not?

    它们之间是否有共同的文件访问权限?

In increasing order of complexity, I'd use a file, a SQLite backend via the RSQlite package or (my favourite :) the rredis package to set to / read from a Redis instance.

为了增加复杂度,我使用一个文件,一个SQLite后端通过RSQlite包或者(我最喜欢的:)rredis包从一个Redis实例中设置/读取。