R:从包A中的源代码链接到包B中的数据

时间:2022-07-05 16:19:47

I'm building a new R package which has a lot of internal data. I decided to split the package into two pieces: A (houses the source code) and B (houses the data). This is in line with official CRAN policies. The data is crucial and the package cannot function without it.

我正在构建一个包含大量内部数据的新R包。我决定将软件包分成两部分:A(包含源代码)和B(包含数据)。这符合官方的CRAN政策。数据至关重要,如果没有它,包装就无法运作。

I'm having trouble linking to the data in B from source code in A. All source code in A is in directory R/ and all data in B is in directory data/. Let's assume that dat is the only data file in B. I tried the following:

我无法从A中的源代码链接到B中的数据.A中的所有源代码都在目录R /中,B中的所有数据都在目录data /中。我们假设dat是B中唯一的数据文件。我尝试了以下内容:

  1. Enable LazyData: true in the DESCRIPTION file for B, this exports the data files. In A, I access the data in source code via B::dat. PROBLEM: R CMD check raises a NOTE that B::dat has not been defined, and CRAN maintainers claim that this is bad practice.

    在B的描述文件中启用LazyData:true,这将导出数据文件。在A中,我通过B :: dat访问源代码中的数据。问题:R CMD检查提出了一个注释,即B :: dat尚未定义,CRAN维护者声称这是不好的做法。

  2. Save all data in B into sysdata.rda in the R/ directory and refer to it using B:::dat from A. PROBLEM: CRAN maintainers claim that this is bad practice since all data should be in data/, not in R/. Also, this way you cannot document the data files in man/.

    将B中的所有数据保存到R /目录中的sysdata.rda中并使用来自A的B ::: dat引用它。问题:CRAN维护者声称这是不好的做法,因为所有数据都应该在数据/中,而不是在R /中。此外,这样您就无法记录man /中的数据文件。

  3. Data in B cannot be exported using the NAMESPACE file using the export(dat) command.

    无法使用export(dat)命令使用NAMESPACE文件导出B中的数据。

  4. data(dat,package="B") loads the data but into the Global Environment, but as internal data it should not be visible to the user so this won't work.

    data(dat,package =“B”)将数据加载到全局环境中,但作为内部数据,它不应对用户可见,因此不起作用。

  5. data(dat,package="B",envir=environment()) loads the data into the local environment of the function call, but every time the function is called (this can be many times), it takes a long time to load the data making calculations too slow. I also tried loading into the package namespace directly but those namespaces are locked and this is not allowed. How can we get the data to load directly into the package namespace?

    data(dat,package =“B”,envir = environment())将数据加载到函数调用的本地环境中,但每次调用该函数时(这可能会多次),加载需要很长时间数据计算太慢了。我也尝试直接加载到包命名空间,但这些命名空间被锁定,这是不允许的。我们如何才能将数据直接加载到包命名空间中?

Any suggestions on how to go about this? What is the correct way to do this? Ideally the data is in the data/ directory in package B and the source code in package A has no problems in accessing it. Thank you!

关于如何解决这个问题的任何建议?这样做的正确方法是什么?理想情况下,数据位于包B中的数据/目录中,包A中的源代码在访问它时没有任何问题。谢谢!

2 个解决方案

#1


1  

A very simple solution that works:

一个非常简单的解决方案:

In Package A's DESCRIPTION file, it should contain Depends: B. I found that using Imports: B will not work for importing data files unless they are in sysdata.rda. Also, package B's NAMESPACE file should contain the following by default:

在Package A的DESCRIPTION文件中,它应该包含Depends:B。我发现使用Imports:B不能用于导入数据文件,除非它们在sysdata.rda中。此外,软件包B的NAMESPACE文件默认情况下应包含以下内容:

exportPattern(".")

This way you can refer to dat directly in any source code in package A.

这样,您可以直接在包A中的任何源代码中引用dat。

#2


0  

Why not keep it simple and be explicit: data(someData, package="A") ?

为什么不保持简单明了:data(someData,package =“A”)?

Having it work magically (as it does when data comes from the same package) may be possible; there are some packages (such as SOAR) doing tricks with lazy-loading and on-demand loading.

让它神奇地工作(就像数据来自同一个包时一样)是可能的;有一些软件包(比如SOAR)通过延迟加载和按需加载来做技巧。

Edit: Come to think about it, there are a number of data packages on CRAN. Did you study those? Eg TH.data was setup by Torsten for use in is package. Isn't that your use case? Maybe you'll find the relevant trick in its setup.

编辑:来考虑一下,CRAN上有很多数据包。你研究过那些吗?例如,TH.data由Torsten设置用于包装。这不是你的用例吗?也许你会在它的设置中找到相关的技巧。

#1


1  

A very simple solution that works:

一个非常简单的解决方案:

In Package A's DESCRIPTION file, it should contain Depends: B. I found that using Imports: B will not work for importing data files unless they are in sysdata.rda. Also, package B's NAMESPACE file should contain the following by default:

在Package A的DESCRIPTION文件中,它应该包含Depends:B。我发现使用Imports:B不能用于导入数据文件,除非它们在sysdata.rda中。此外,软件包B的NAMESPACE文件默认情况下应包含以下内容:

exportPattern(".")

This way you can refer to dat directly in any source code in package A.

这样,您可以直接在包A中的任何源代码中引用dat。

#2


0  

Why not keep it simple and be explicit: data(someData, package="A") ?

为什么不保持简单明了:data(someData,package =“A”)?

Having it work magically (as it does when data comes from the same package) may be possible; there are some packages (such as SOAR) doing tricks with lazy-loading and on-demand loading.

让它神奇地工作(就像数据来自同一个包时一样)是可能的;有一些软件包(比如SOAR)通过延迟加载和按需加载来做技巧。

Edit: Come to think about it, there are a number of data packages on CRAN. Did you study those? Eg TH.data was setup by Torsten for use in is package. Isn't that your use case? Maybe you'll find the relevant trick in its setup.

编辑:来考虑一下,CRAN上有很多数据包。你研究过那些吗?例如,TH.data由Torsten设置用于包装。这不是你的用例吗?也许你会在它的设置中找到相关的技巧。