解决R速度太慢问题

时间:2023-03-08 23:57:54
解决R速度太慢问题

R的速度慢一直被人诟病,最近做一个比较大的dataset的分析,跑得实在太慢,发现症结是R的data frame的index太慢:

以下为测试:

gene_list = 1:100000
eQTL_mat = matrix(nrow = length(gene_list), ncol = 7) # 创建一个matrix
eQTL_df = as.data.frame(matrix(nrow = length(gene_list), ncol = 7)) # 创建一个data frame
eQTL_list = replicate(length(gene_list), list()) # 创建一个list try_func = function() return(1:7)
# test eQTL
system.time(
sapply(gene_list, function(x) return (try_func()))
)

 ### user system elapsed

 ### 0.108 0.001 0.108

system.time(
for (gene_ind in 1:length(gene_list)){
eQTL_mat[gene_ind, ] = try_func()
}
) ### user system elapsed

 ### 0.137 0.000 0.138

system.time(
for (gene_ind in 1:length(gene_list)){
eQTL_df[gene_ind, ] = try_func()
}
)

  ### user system elapsed

  ### 90.623 165.868 259.065

system.time(
for (gene_ind in 1:length(gene_list)){
eQTL_list[[gene_ind]] = 1:7
}
)

  ### user system elapsed
  ### 0.089 0.000 0.090

 

结果看到了吗? 太震精了!data frame真的不适合大数据!