如何处理不完美但有用的函数?

时间:2022-03-14 17:42:32

I could equally have titled this question, "Is it good enough for CRAN?"

我也可以给这个问题取个同样的名字,“这对CRAN来说够好吗?”

I have a collection of functions that I've built up for specific tasks. Some of these are convenience functions:

我有一个为特定任务构建的函数集合。其中一些是便利功能:

# Returns odds/evens from a vector
odds=function(vec) {
    stopifnot(class(vec)=="integer")
    ret = vec[fpart(vec/2)!=0]
    ret
}
evens=function(vec) {
    stopifnot(class(vec)=="integer")
    ret = vec[fpart(vec/2)==0]
    ret
}

Some are minor additions that have proven useful in answering common SO question:

有些是小的补充,已经被证明是有用的回答共同的问题,所以:

# Shift a vector over by n spots
# wrap adds the entry at the beginning to the end
# pad does nothing unless wrap is false, in which case it specifies whether to pad with NAs
shift <- function(vec,n=1,wrap=TRUE,pad=FALSE) {
    if(length(vec)<abs(n)) { 
        #stop("Length of vector must be greater than the magnitude of n \n") 
    }
    if(n==0) { 
        return(vec) 
    } else if(length(vec)==n) { 
        # return empty
        length(vec) <- 0
        return(vec)
    } else if(n>0) {
        returnvec <- vec[seq(n+1,length(vec) )]
        if(wrap) {
            returnvec <- c(returnvec,vec[seq(n)])
        } else if(pad) {
            returnvec <- c(returnvec,rep(NA,n))
        }
    } else if(n<0) {
        returnvec <- vec[seq(1,length(vec)-abs(n))]
        if(wrap) {
            returnvec <- c( vec[seq(length(vec)-abs(n)+1,length(vec))], returnvec )
        } else if(pad) {
            returnvec <- c( rep(NA,abs(n)), returnvec )
        }

    }
    return(returnvec)
}

The most important are extensions to existing classes that can't be found anywhere else (e.g. a CDF panel function for lattice plots, various xtable and LaTeX output functions, classes for handling and converting between geospatial object types and performing various GIS-like operations such as overlays).

最重要的是对现有类的扩展,这些类在其他任何地方都找不到(例如,用于晶格图的CDF面板函数、各种xtable和LaTeX输出函数、用于在地理空间对象类型之间处理和转换的类,以及执行各种类似gis的操作,如覆盖)。

I would like to make these available somewhere on the internet in R-ized form (e.g. posting them on a blog as plain text functions is not what I'm looking for), so that maintenance is easier and so that I and others can access them from any computer that I go to. The logical thing to do is to make a package out of them and post them to CRAN--and indeed I already have them packaged up. But is this collection of functions suitable for a CRAN package?

我想让这些在网上R-ized形式(例如,在博客上发布他们作为普通文本函数不是我要找的),以便维护更容易,这样我和其他人可以从任何电脑,我去访问它们。要做的合乎逻辑的事情就是从他们那里拿出一个包裹,然后寄给CRAN——事实上,我已经把它们包装好了。但是这些功能是否适合CRAN包呢?

I have two main concerns:

我有两个主要问题:

  1. The functions don't seem to have any coherent overlay. It's just a collection of functions that do lots of different things.
  2. 函数似乎没有任何相干叠加。它只是一系列函数的集合,可以做很多不同的事情。
  3. My code isn't always the prettiest. I've tried to clean it up as I learned better coding practices, but producing R Core-worthy beautiful code is not in the cards.
  4. 我的代码并不总是最漂亮的。我已经尝试过清理它,因为我学到了更好的编码实践,但是生成具有核心价值的漂亮代码是不可能的。

The CRAN webpage is surprisingly bereft of guidelines on posting. Should I post to CRAN, given that some people will find it useful but that it will in some sense forever lock R into having some pretty basic function names taken up? Or is there another place I can use an install.packages-like command to install from? Note I'd rather avoid posting the package to a webpage and having people have to memorize the URL to install the package (not least for version control issues).

令人惊讶的是,CRAN的网页缺乏关于发布的指导方针。考虑到有些人会觉得它有用,但在某种意义上它会永远把R锁在一些非常基本的函数名中,我应该把它发布给CRAN吗?或者我可以在别的地方安装。要安装的包类命令?注意,我宁愿避免将包发布到网页上,也不希望让人们记住安装包的URL(尤其是在版本控制问题上)。

3 个解决方案

#1


4  

Most packages should be collections of related functions with an obvious purpose, so a useful thing to do would be to try and group what you have together, and see if you can classify them. Several smaller packages are better than one huge incoherent package.

大多数包应该是具有明显目的的相关函数的集合,所以有用的做法是尝试将所有的包分组,看看是否可以对它们进行分类。几个小的包比一个大的不连贯的包要好。

That said, there are some packages that are collections of miscellaneous utility functions, most notably Hmisc and gregmisc, so it is okay to do that sort of thing. If you just have a few functions like that, it might be worth contacting the author of some of the misc packages and seeing if they'll let you include your code in their package.

也就是说,有些包是杂项实用函数的集合,最明显的是Hmisc和gregmisc,所以这样做是可以的。如果您只有一些这样的函数,那么您可以联系一下misc包的作者,看看他们是否允许您将代码包含在他们的包中。

As for writing pretty code, the most important thing you can do is to use a style guide.

至于写漂亮的代码,你能做的最重要的事情就是使用一种风格指南。

#2


5  

I would use http://r-forge.r-project.org/. From the top of the page:

我将使用http://r-forge.r-project.org/。从页面顶端:

R-Forge offers a central platform for the development of R packages, R-related software and further projects. It is based on FusionForge offering easy access to the best in SVN, daily built and checked packages, mailing lists, bug tracking, message boards/forums, site hosting, permanent file archival, full backups, and total web-based administration.

R- forge为开发R包、R相关软件和其他项目提供了一个中心平台。它基于FusionForge,提供了对SVN、每日构建和检查包、邮件列表、bug跟踪、留言板/论坛、站点托管、永久文件归档、完整备份和基于web的全面管理的最佳访问。

#3


0  

In my opinion it is not a good idea to make this type material into packages.
Misc-packages do exist, but mostly for historical reason and/or due to their authoritative contributors, see Frank Harrell Hmisc .

在我看来,把这种类型的材料做成包装不是一个好主意。错误包确实存在,但主要是由于历史原因和/或由于它们的权威贡献者,请参阅Frank Harrell Hmisc。

I see three main reason why this choice does non fit for disparate collection of functions.

我看到了这个选择不适合不同函数集合的三个主要原因。

  1. There are by and large 7000 packages on CRAN only. It is unlikely that your package will be chosen if it does not target a specific field and, even when this happens, it is very possible that other established packages do the same. Therefore your package should also sport an original/better solution to the problem it deals with.

    CRAN上大约只有7000个包。如果您的包不针对特定的字段,那么不太可能选择您的包,而且,即使发生这种情况,也很可能其他已建立的包也会这样做。因此,您的软件包也应该有一个原始的/更好的解决方案来解决它所处理的问题。

  2. Repositories, and CRAN in particular, are task-oriented, which suggests packages' functions should address a coherent task. And for a good reason: there is no point in downloading a whole package with say, 50 autonomous functions, when I need just a couple of them. Instead, if a package solves a specific data problem of mine, than I will most likely need most (if not all) of them.

    存储库,尤其是CRAN,是面向任务的,这意味着包的功能应该解决一个一致的任务。有一个很好的理由:当我只需要其中的几个函数时,下载一个包含50个自治函数的软件包是没有意义的。相反,如果一个包解决了我的一个特定数据问题,那么我很可能需要其中的大部分(如果不是全部的话)。

  3. R repositories tend to mask the content. Contrary to tech blogs, you do not immediately see the functions' source. You need to download a separate source package and there is a lot of overhead due to the package structure, which buries the actual functions you are willing to show and the others need to read.

    R存储库倾向于屏蔽内容。与技术博客相反,您不会立即看到函数的源代码。您需要下载一个单独的源包,由于包结构的原因,存在大量的开销,这掩盖了您希望显示的实际功能,以及其他需要阅读的功能。

In my opinion the best place for general convenience functions, are sites like GitHub. In fact:

在我看来,一般便利功能的最佳去处是GitHub之类的网站。事实上:

  1. One immediately reads them with the comfort of syntax highlight. If they are interesting, they can be pasted in R to give a try and possibly keep them, otherwise one simply steps over to read next function.

    你可以立即用语法高亮显示来阅读它们。如果它们是有趣的,可以粘贴到R中进行尝试并可能保留它们,否则只需一步一步来读取下一个函数。

  2. There is the possibility of organising code, but without all the constraints of an actual package. Similar functions might go in the same file and coherent files in the same subfolder.

    有组织代码的可能性,但是没有实际包的所有约束。类似的函数可以放在相同的文件中,而一致的文件可以放在相同的子文件夹中。

  3. You can show your ideas to the others in a simple way. The readme file immediately become a sort of mini webpage (via markdown).

    你可以用简单的方式向别人展示你的想法。自述文件立即成为一种迷你网页(通过markdown)。

There are a lot of other benefits (revision history, accepting contributions, GitHub pages), which may or may not interest you.

还有许多其他的好处(修订历史、接受贡献、GitHub页面),它们可能会让你感兴趣,也可能不会让你感兴趣。

Of course, after several functions grow in a stable coherent direction, you will turn them into an actual CRAN package. Also because the copy and paste method to try them becomes then inconvenient.

当然,在几个函数以稳定的一致方向增长之后,您将把它们转换为实际的CRAN包。也因为复制粘贴方法来尝试它们变得不方便。

Of course I realise what I am writing could be partly a matter individual preferences.

当然,我意识到我写的东西在某种程度上可能是个人偏好的问题。

#1


4  

Most packages should be collections of related functions with an obvious purpose, so a useful thing to do would be to try and group what you have together, and see if you can classify them. Several smaller packages are better than one huge incoherent package.

大多数包应该是具有明显目的的相关函数的集合,所以有用的做法是尝试将所有的包分组,看看是否可以对它们进行分类。几个小的包比一个大的不连贯的包要好。

That said, there are some packages that are collections of miscellaneous utility functions, most notably Hmisc and gregmisc, so it is okay to do that sort of thing. If you just have a few functions like that, it might be worth contacting the author of some of the misc packages and seeing if they'll let you include your code in their package.

也就是说,有些包是杂项实用函数的集合,最明显的是Hmisc和gregmisc,所以这样做是可以的。如果您只有一些这样的函数,那么您可以联系一下misc包的作者,看看他们是否允许您将代码包含在他们的包中。

As for writing pretty code, the most important thing you can do is to use a style guide.

至于写漂亮的代码,你能做的最重要的事情就是使用一种风格指南。

#2


5  

I would use http://r-forge.r-project.org/. From the top of the page:

我将使用http://r-forge.r-project.org/。从页面顶端:

R-Forge offers a central platform for the development of R packages, R-related software and further projects. It is based on FusionForge offering easy access to the best in SVN, daily built and checked packages, mailing lists, bug tracking, message boards/forums, site hosting, permanent file archival, full backups, and total web-based administration.

R- forge为开发R包、R相关软件和其他项目提供了一个中心平台。它基于FusionForge,提供了对SVN、每日构建和检查包、邮件列表、bug跟踪、留言板/论坛、站点托管、永久文件归档、完整备份和基于web的全面管理的最佳访问。

#3


0  

In my opinion it is not a good idea to make this type material into packages.
Misc-packages do exist, but mostly for historical reason and/or due to their authoritative contributors, see Frank Harrell Hmisc .

在我看来,把这种类型的材料做成包装不是一个好主意。错误包确实存在,但主要是由于历史原因和/或由于它们的权威贡献者,请参阅Frank Harrell Hmisc。

I see three main reason why this choice does non fit for disparate collection of functions.

我看到了这个选择不适合不同函数集合的三个主要原因。

  1. There are by and large 7000 packages on CRAN only. It is unlikely that your package will be chosen if it does not target a specific field and, even when this happens, it is very possible that other established packages do the same. Therefore your package should also sport an original/better solution to the problem it deals with.

    CRAN上大约只有7000个包。如果您的包不针对特定的字段,那么不太可能选择您的包,而且,即使发生这种情况,也很可能其他已建立的包也会这样做。因此,您的软件包也应该有一个原始的/更好的解决方案来解决它所处理的问题。

  2. Repositories, and CRAN in particular, are task-oriented, which suggests packages' functions should address a coherent task. And for a good reason: there is no point in downloading a whole package with say, 50 autonomous functions, when I need just a couple of them. Instead, if a package solves a specific data problem of mine, than I will most likely need most (if not all) of them.

    存储库,尤其是CRAN,是面向任务的,这意味着包的功能应该解决一个一致的任务。有一个很好的理由:当我只需要其中的几个函数时,下载一个包含50个自治函数的软件包是没有意义的。相反,如果一个包解决了我的一个特定数据问题,那么我很可能需要其中的大部分(如果不是全部的话)。

  3. R repositories tend to mask the content. Contrary to tech blogs, you do not immediately see the functions' source. You need to download a separate source package and there is a lot of overhead due to the package structure, which buries the actual functions you are willing to show and the others need to read.

    R存储库倾向于屏蔽内容。与技术博客相反,您不会立即看到函数的源代码。您需要下载一个单独的源包,由于包结构的原因,存在大量的开销,这掩盖了您希望显示的实际功能,以及其他需要阅读的功能。

In my opinion the best place for general convenience functions, are sites like GitHub. In fact:

在我看来,一般便利功能的最佳去处是GitHub之类的网站。事实上:

  1. One immediately reads them with the comfort of syntax highlight. If they are interesting, they can be pasted in R to give a try and possibly keep them, otherwise one simply steps over to read next function.

    你可以立即用语法高亮显示来阅读它们。如果它们是有趣的,可以粘贴到R中进行尝试并可能保留它们,否则只需一步一步来读取下一个函数。

  2. There is the possibility of organising code, but without all the constraints of an actual package. Similar functions might go in the same file and coherent files in the same subfolder.

    有组织代码的可能性,但是没有实际包的所有约束。类似的函数可以放在相同的文件中,而一致的文件可以放在相同的子文件夹中。

  3. You can show your ideas to the others in a simple way. The readme file immediately become a sort of mini webpage (via markdown).

    你可以用简单的方式向别人展示你的想法。自述文件立即成为一种迷你网页(通过markdown)。

There are a lot of other benefits (revision history, accepting contributions, GitHub pages), which may or may not interest you.

还有许多其他的好处(修订历史、接受贡献、GitHub页面),它们可能会让你感兴趣,也可能不会让你感兴趣。

Of course, after several functions grow in a stable coherent direction, you will turn them into an actual CRAN package. Also because the copy and paste method to try them becomes then inconvenient.

当然,在几个函数以稳定的一致方向增长之后,您将把它们转换为实际的CRAN包。也因为复制粘贴方法来尝试它们变得不方便。

Of course I realise what I am writing could be partly a matter individual preferences.

当然,我意识到我写的东西在某种程度上可能是个人偏好的问题。