To celebrate the 100,000th question in the r tag, I'd like to create a list of the names of all package authors on CRAN.
为了庆祝r标签中的第100,000个问题,我想创建一个CRAN上所有包作者名称的列表。
Initially, I thought I could do this using available.packages()
but sadly this doesn't contain a column of the authors.
最初,我认为我可以使用available.packages()来做到这一点,但遗憾的是,这不包含作者的专栏。
pdb <- available.packages()
colnames(pdb)
[1] "Package" "Version" "Priority"
[4] "Depends" "Imports" "LinkingTo"
[7] "Suggests" "Enhances" "License"
[10] "License_is_FOSS" "License_restricts_use" "OS_type"
[13] "Archs" "MD5sum" "NeedsCompilation"
[16] "File" "Repository"
This information is available in the DESCRIPTION
file for each package. So I can think of two brute force ways, neither of which are very elegant:
每个包的DESCRIPTION文件中都提供了此信息。所以我可以想到两种蛮力方式,两者都不是很优雅:
-
Download each of the 6,878 packages and read the
DESCRIPTION
file usingbase::read.dcf()
下载6,878个软件包,并使用base :: read.dcf()读取DESCRIPTION文件
-
Scrape each of the package pages on CRAN. For example, https://cran.r-project.org/web/packages/MASS/index.html tells me that Brian Ripley is the author of MASS.
在CRAN上刮取每个包页面。例如,https://cran.r-project.org/web/packages/MASS/index.html告诉我Brian Ripley是MASS的作者。
I don't want to download all of CRAN to answer this question. And I don't want to scrape the HTML either, since the information in the DESCRIPTION file is a neatly formatted list of person
objects (see ?person
).
我不想下载所有CRAN来回答这个问题。而且我也不想刮HTML,因为DESCRIPTION文件中的信息是一个整齐格式的人物对象列表(参见?person)。
How can I use the information on CRAN to easily build a list of package authors?
如何使用CRAN上的信息轻松构建包作者列表?
2 个解决方案
#1
6
Taken from reverse_dependencies_with_maintainers
, which was available at one point on the R developer site (I don't see it there now):
取自reverse_dependencies_with_maintainers,它在R开发者网站上的某个点可用(我现在看不到它):
description <- sprintf("%s/web/packages/packages.rds",
getOption("repos")["CRAN"])
con <- if(substring(description, 1L, 7L) == "file://") {
file(description, "rb")
} else {
url(description, "rb")
}
db <- as.data.frame(readRDS(gzcon(con)),stringsAsFactors=FALSE)
close(con)
rownames(db) <- NULL
head(db$Author)
head(db$"Authors@R")
Where Authors@R
exists it might be parseable into something better using dget()
作者@R存在的地方,使用dget()可以更好地解析它
getAuthor <- function(x){
if(is.na(x)) return(NA)
a <- textConnection(x)
on.exit(close(a))
dget(a)
}
authors <- lapply(db$"Authors@R", getAuthor)
head(authors)
[[1]]
[1] NA
[[2]]
[1] "Gaurav Sood <gsood07@gmail.com> [aut, cre]"
[[3]]
[1] "Csillery Katalin <kati.csillery@gmail.com> [aut]"
[2] "Lemaire Louisiane [aut]"
[3] "Francois Olivier [aut]"
[4] "Blum Michael <michael.blum@imag.fr> [aut, cre]"
[[4]]
[1] NA
[[5]]
[1] "Csillery Katalin <kati.csillery@gmail.com> [aut]"
[2] "Lemaire Louisiane [aut]"
[3] "Francois Olivier [aut]"
[4] "Blum Michael <michael.blum@imag.fr> [aut, cre]"
[[6]]
[1] NA
#2
7
Why not use Gabor's API for CRAN packages?
为什么不将Gabor的API用于CRAN包?
e.g. http://crandb.r-pkg.org/MASS
例如http://crandb.r-pkg.org/MASS
library("httr")
content(GET("http://crandb.r-pkg.org/MASS"))$Author
[1] "Brian Ripley [aut, cre, cph],\nBill Venables [ctb],\nDouglas M. Bates [ctb],\nKurt Hornik [trl] (partial port ca 1998),\nAlbrecht Gebhardt [trl] (partial port ca 1998),\nDavid Firth [ctb]"
#1
6
Taken from reverse_dependencies_with_maintainers
, which was available at one point on the R developer site (I don't see it there now):
取自reverse_dependencies_with_maintainers,它在R开发者网站上的某个点可用(我现在看不到它):
description <- sprintf("%s/web/packages/packages.rds",
getOption("repos")["CRAN"])
con <- if(substring(description, 1L, 7L) == "file://") {
file(description, "rb")
} else {
url(description, "rb")
}
db <- as.data.frame(readRDS(gzcon(con)),stringsAsFactors=FALSE)
close(con)
rownames(db) <- NULL
head(db$Author)
head(db$"Authors@R")
Where Authors@R
exists it might be parseable into something better using dget()
作者@R存在的地方,使用dget()可以更好地解析它
getAuthor <- function(x){
if(is.na(x)) return(NA)
a <- textConnection(x)
on.exit(close(a))
dget(a)
}
authors <- lapply(db$"Authors@R", getAuthor)
head(authors)
[[1]]
[1] NA
[[2]]
[1] "Gaurav Sood <gsood07@gmail.com> [aut, cre]"
[[3]]
[1] "Csillery Katalin <kati.csillery@gmail.com> [aut]"
[2] "Lemaire Louisiane [aut]"
[3] "Francois Olivier [aut]"
[4] "Blum Michael <michael.blum@imag.fr> [aut, cre]"
[[4]]
[1] NA
[[5]]
[1] "Csillery Katalin <kati.csillery@gmail.com> [aut]"
[2] "Lemaire Louisiane [aut]"
[3] "Francois Olivier [aut]"
[4] "Blum Michael <michael.blum@imag.fr> [aut, cre]"
[[6]]
[1] NA
#2
7
Why not use Gabor's API for CRAN packages?
为什么不将Gabor的API用于CRAN包?
e.g. http://crandb.r-pkg.org/MASS
例如http://crandb.r-pkg.org/MASS
library("httr")
content(GET("http://crandb.r-pkg.org/MASS"))$Author
[1] "Brian Ripley [aut, cre, cph],\nBill Venables [ctb],\nDouglas M. Bates [ctb],\nKurt Hornik [trl] (partial port ca 1998),\nAlbrecht Gebhardt [trl] (partial port ca 1998),\nDavid Firth [ctb]"