循环函数中的因子变量的级别

时间:2021-01-15 22:53:41

I have a dataframe, dat, with a covariate site coded as a factor with 31 different levels.

我有一个数据帧,dat,协变量站点编码为31个不同级别的因子。

cas_1_sitea_586754968 0 0 1 2 0 sitea 
con_65_sitea_568859302 1 0 2 1 1 siteb
cas_9_siteb_0799700 0 0 0 0 0 siteb 
con_siteb_THR84569 2 0 0 1 0 sitea

I have a function that works when I apply it to one site variable at a time:

我有一个函数,当我一次将它应用于一个站点变量时,它可以工作:

get_maf <- function(data){
    allele.count <- apply(data[,1:(ncol(data)-2)],2,sum)
    maf <- allele.count/(2*nrow(data))
    out <- paste((unique(data$site)),"_jp.maf",sep="")
    write.table(maf, out, col.names=F, quote=F)
}

But, when I try to loop over the data within each of the 31 sites using lapply like this:

但是,当我尝试使用这样的lapply循环遍历31个站点中的每个站点中的数据时:

lapply(unique(dat$site), get_maf, data = dat)    

I get an error: lapply(unique(jp$site), get_maf_jp, data = jp) Error in FUN(c("aber", "ajsz", "asrb", "buls", "cati", "caws", "cims", : unused argument (c("aber", "ajsz", "asrb", "buls", "cati", "caws", "cims", "clo3", "cou3", "denm", "dubl", "edin", "egcu", "ersw", "gras", "irwt", "lie2", "lie5", "mgs2", "msaf", "munc", "pewb", "pews", "s234", "swe1", "swe5", "swe6", "top8", "ucla", "umeb", "umes")[[1]])

我得到一个错误:lapply(unique(jp $ site),get_maf_jp,data = jp)FUN中的错误(c(“aber”,“ajsz”,“asrb”,“buls”,“cati”,“caws”, “cims”,:未使用的参数(c(“aber”,“ajsz”,“asrb”,“buls”,“cati”,“caws”,“cims”,“clo3”,“cou3”,“denm”, “dubl”,“edin”,“egcu”,“ersw”,“gras”,“irwt”,“lie2”,“lie5”,“mgs2”,“msaf”,“munc”,“pewb”,“长椅” “,”s234“,”swe1“,”swe5“,”swe6“,”top8“,”ucla“,”umeb“,”umes“)[[1]])

Any insights into what I am doing wrong here are greatly appreciated.

任何有关我在这里做错的见解都非常感谢。

1 个解决方案

#1


1  

The problem with the lapply(unique(dat$site), get_maf, data = dat) expression is that it tries to pass two arguments to get_maf: first comes from lapply, and the second comes from data=dat. You can fix it like that: lapply(unique(dat$site), function(s) {get_maf(data=dat[dat$site==s,]}).

lapply(唯一(dat $ site),get_maf,data = dat)表达式的问题是它试图将两个参数传递给get_maf:first来自lapply,第二个来自data = dat。您可以像这样修复它:lapply(unique(dat $ site),function(s){get_maf(data = dat [dat $ site == s,]})。

Alternatively, you can use

或者,您可以使用

library(dplyr)
dat %>% group_by(site) %>% get_maf

PS: if you're dealing with large data sets, consider using allele.count <- colSums(data[,1:(ncol(data)-2)]) in the get_maf function instead of much slower allele.count <- apply(data[,1:(ncol(data)-2)],2,sum) that you have now.

PS:如果你正在处理大数据集,考虑在get_maf函数中使用allele.count < - colSums(data [,1:(ncol(data)-2)])而不是更慢的allele.count < - apply (数据[,1:(ncol(data)-2)],2,sum)你现在拥有。

#1


1  

The problem with the lapply(unique(dat$site), get_maf, data = dat) expression is that it tries to pass two arguments to get_maf: first comes from lapply, and the second comes from data=dat. You can fix it like that: lapply(unique(dat$site), function(s) {get_maf(data=dat[dat$site==s,]}).

lapply(唯一(dat $ site),get_maf,data = dat)表达式的问题是它试图将两个参数传递给get_maf:first来自lapply,第二个来自data = dat。您可以像这样修复它:lapply(unique(dat $ site),function(s){get_maf(data = dat [dat $ site == s,]})。

Alternatively, you can use

或者,您可以使用

library(dplyr)
dat %>% group_by(site) %>% get_maf

PS: if you're dealing with large data sets, consider using allele.count <- colSums(data[,1:(ncol(data)-2)]) in the get_maf function instead of much slower allele.count <- apply(data[,1:(ncol(data)-2)],2,sum) that you have now.

PS:如果你正在处理大数据集,考虑在get_maf函数中使用allele.count < - colSums(data [,1:(ncol(data)-2)])而不是更慢的allele.count < - apply (数据[,1:(ncol(data)-2)],2,sum)你现在拥有。