I have an object (variable rld
) which looks a bit like a "data.frame" (see further down the post for details) in that it has columns that can be accessed using $
or [[]]
.
我有一个对象(变量rld)看起来有点像“data.frame”(详见帖子下面的详细信息),因为它有可以使用$或[[]]访问的列。
I have a vector groups
containing names of some of its columns (3 in example below).
我有一个包含其某些列名称的向量组(下面的示例中为3)。
I generate strings based on combinations of elements in the columns as follows:
我根据列中元素的组合生成字符串,如下所示:
paste(rld[[groups[1]]], rld[[groups[2]]], rld[[groups[3]]], sep="-")
I would like to generalize this so that I don't need to know how many elements are in groups
.
我想概括一下,这样我就不需要知道组中有多少元素了。
The following attempt fails:
以下尝试失败:
> paste(rld[[groups]], collapse="-")
Error in normalizeDoubleBracketSubscript(i, x, exact = exact, error.if.nomatch = FALSE) :
attempt to extract more than one element
Here is how I would do in functional-style with a python dictionary:
以下是我将如何使用python字典进行功能样式:
map("-".join, zip(*map(rld.get, groups)))
Is there a similar column-getter operator in R ?
R中是否有类似的列-getter运算符?
As suggested in the comments, here is the output of dput(rld)
: http://paste.ubuntu.com/23528168/ (I could not paste it directly, since it is huge.)
正如评论中所建议的,这里是dput(rld)的输出:http://paste.ubuntu.com/23528168/(我无法直接粘贴它,因为它很大。)
This was generated using the DESeq2 bioinformatics package, and more precisely, doing something similar to what is described page 28 of this document: https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf.
这是使用DESeq2生物信息学软件包生成的,更准确地说,是与本文档第28页所描述的类似:https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/ DESeq2.pdf。
DESeq2 can be installed from bioconductor as follows:
DESeq2可以从bioconductor安装如下:
source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")
Reproducible example
One of the solutions worked when running in interactive mode, but failed when the code was put in a library function, with the following error:
其中一个解决方案在交互模式下运行时有效,但在将代码放入库函数时失败,出现以下错误:
Error in do.call(function(...) paste(..., sep = "-"), colData(rld)[groups]) :
second argument must be a list
After some tests, it appears that the problem doesn't occur if the function is in the main calling script, as follows:
经过一些测试后,如果函数在主调用脚本中,似乎不会出现问题,如下所示:
library(DESeq2)
library(test.package)
lib_names <- c(
"WT_1",
"mut_1",
"WT_2",
"mut_2",
"WT_3",
"mut_3"
)
file_names <- paste(
lib_names,
"txt",
sep="."
)
wt <- "WT"
mut <- "mut"
genotypes <- rep(c(wt, mut), times=3)
replicates <- c(rep("1", times=2), rep("2", times=2), rep("3", times=2))
sample_table = data.frame(
lib = lib_names,
file_name = file_names,
genotype = genotypes,
replicate = replicates
)
dds_raw <- DESeqDataSetFromHTSeqCount(
sampleTable = sample_table,
directory = ".",
design = ~ genotype
)
# Remove genes with too few read counts
dds <- dds_raw[ rowSums(counts(dds_raw)) > 1, ]
dds$group <- factor(dds$genotype)
design(dds) <- ~ replicate + group
dds <- DESeq(dds)
test_do_paste <- function(dds) {
require(DESeq2)
groups <- head(colnames(colData(dds)), -2)
rld <- rlog(dds, blind=F)
stopifnot(all(groups %in% names(colData(rld))))
combined_names <- do.call(
function (...) paste(..., sep = "-"),
colData(rld)[groups]
)
print(combined_names)
}
test_do_paste(dds)
# This fails (with the same function put in a package)
#test.package::test_do_paste(dds)
The error occurs when the function is packaged as in https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
函数打包时出现错误,如https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
Data used in the example:
示例中使用的数据:
-
WT_1.txt
-
WT_2.txt
-
WT_3.txt
-
mut_1.txt
-
mut_2.txt
-
mut_3.txt
I posted this issue as a separate question: do.call error "second argument must be a list" with S4Vectors when the code is in a library
我把这个问题作为一个单独的问题发布了:当代码在库中时,do.call错误“第二个参数必须是一个列表”与S4Vectors
Although I have an answer to my initial question, I'm still interested in alternative solutions for the "column extraction using a vector of column names" issue.
虽然我对最初的问题有答案,但我仍然对“使用列名称向量列提取”问题的替代解决方案感兴趣。
1 个解决方案
#1
3
We may use either of the following:
我们可能会使用以下任何一种方法:
do.call(function (...) paste(..., sep = "-"), rld[groups])
do.call(paste, c(rld[groups], sep = "-"))
We can consider a small, reproducible example:
我们可以考虑一个小的,可重现的例子:
rld <- mtcars[1:5, ]
groups <- names(mtcars)[c(1,3,5,6,8)]
do.call(paste, c(rld[groups], sep = "-"))
#[1] "21-160-3.9-2.62-0" "21-160-3.9-2.875-0" "22.8-108-3.85-2.32-1"
#[4] "21.4-258-3.08-3.215-1" "18.7-360-3.15-3.44-0"
Note, it is your responsibility to ensure all(groups %in% names(rld))
is TRUE
, otherwise you get "subscript out of bound" or "undefined column selected" error.
注意,您有责任确保所有(%名称(rld)中的组%)为TRUE,否则会出现“下标超出范围”或“未定义列选择”错误。
(I am copying your comment as a follow-up)
(我正在复制你的评论作为后续行动)
It seems the methods you propose don't work directly on my object. However, the package I'm using provides a colData
function that makes something more similar to a data.frame
:
看来你提出的方法不能直接在我的对象上工作。但是,我正在使用的包提供了一个colData函数,它使一些更类似于data.frame:
> class(colData(rld))
[1] "DataFrame"
attr(,"package")
[1] "S4Vectors"
do.call(function (...) paste(..., sep = "-"), colData(rld)[groups])
works, but do.call(paste, c(colData(rld)[groups], sep = "-"))
fails with an error message I fail to understand (as too often with R...):
do.call(function(...)paste(...,sep =“ - ”),colData(rld)[groups])有效,但是do.call(paste,c(colData(rld)[groups], sep =“ - ”))失败并显示错误消息我无法理解(通常使用R ...):
> do.call(paste, c(colData(rld)[groups], sep = "-"))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘mcols’ for signature ‘"character"’
#1
3
We may use either of the following:
我们可能会使用以下任何一种方法:
do.call(function (...) paste(..., sep = "-"), rld[groups])
do.call(paste, c(rld[groups], sep = "-"))
We can consider a small, reproducible example:
我们可以考虑一个小的,可重现的例子:
rld <- mtcars[1:5, ]
groups <- names(mtcars)[c(1,3,5,6,8)]
do.call(paste, c(rld[groups], sep = "-"))
#[1] "21-160-3.9-2.62-0" "21-160-3.9-2.875-0" "22.8-108-3.85-2.32-1"
#[4] "21.4-258-3.08-3.215-1" "18.7-360-3.15-3.44-0"
Note, it is your responsibility to ensure all(groups %in% names(rld))
is TRUE
, otherwise you get "subscript out of bound" or "undefined column selected" error.
注意,您有责任确保所有(%名称(rld)中的组%)为TRUE,否则会出现“下标超出范围”或“未定义列选择”错误。
(I am copying your comment as a follow-up)
(我正在复制你的评论作为后续行动)
It seems the methods you propose don't work directly on my object. However, the package I'm using provides a colData
function that makes something more similar to a data.frame
:
看来你提出的方法不能直接在我的对象上工作。但是,我正在使用的包提供了一个colData函数,它使一些更类似于data.frame:
> class(colData(rld))
[1] "DataFrame"
attr(,"package")
[1] "S4Vectors"
do.call(function (...) paste(..., sep = "-"), colData(rld)[groups])
works, but do.call(paste, c(colData(rld)[groups], sep = "-"))
fails with an error message I fail to understand (as too often with R...):
do.call(function(...)paste(...,sep =“ - ”),colData(rld)[groups])有效,但是do.call(paste,c(colData(rld)[groups], sep =“ - ”))失败并显示错误消息我无法理解(通常使用R ...):
> do.call(paste, c(colData(rld)[groups], sep = "-"))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘mcols’ for signature ‘"character"’