I am trying to apply ddply on a large data.frame (38000 rows / 10 variables), but I am stuck with an error:
我试着在一个大数据上应用ddply(38000行/ 10个变量),但是我遇到了一个错误:
ddply(uncertainty.long, .(Species), "nrow")
returns the error:
返回错误:
Error in attributes(out) <- attributes(col) :
'names' attribute [38000] must be the same length as the vector [3800]
> traceback()
11: FUN(1:10[[5L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: (function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(capture.output(print(piece)), collapse = "\n")
stop("with piece ", i, ": \n", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
})(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(uncertainty.long, .(Species), "nrow")
Some more details about my data.frame:
关于我的数据的更多细节。
> head(uncertainty.long)
Stack Variable PARun Model Species value year scenario GCM sp
1 sync_current Total PA1 GLM Arctosafulvolineata 100.0000 NA <NA> <NA> Arctosa\nfulvolineata
2 sync_cgcm2_B2A_2020 Total PA1 GLM Arctosafulvolineata 134.6840 2020 B2A cgcm2 Arctosa\nfulvolineata
3 sync_cgcm2_B2A_2050 Total PA1 GLM Arctosafulvolineata 153.7617 2050 B2A cgcm2 Arctosa\nfulvolineata
4 sync_cgcm2_B2A_2080 Total PA1 GLM Arctosafulvolineata 195.7176 2080 B2A cgcm2 Arctosa\nfulvolineata
5 sync_mk2_B2A_2020 Total PA1 GLM Arctosafulvolineata 172.2967 2020 B2A mk2 Arctosa\nfulvolineata
6 sync_mk2_B2A_2050 Total PA1 GLM Arctosafulvolineata 198.9391 2050 B2A mk2 Arctosa\nfulvolineata
> str(uncertainty.long)
'data.frame': 38000 obs. of 10 variables:
$ Stack : Factor w/ 19 levels "sync_cgcm2_B2A_2020",..: 7 1 2 3 14 15 16 11 12 13 ...
$ Variable: Factor w/ 5 levels "Lost","NetChange",..: 5 5 5 5 5 5 5 5 5 5 ...
$ PARun : Factor w/ 5 levels "PA1","PA2","PA3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Model : Factor w/ 8 levels "CTA","FDA","GAM",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr "1" "1" "1" "1" ...
$ value : num 100 135 154 196 172 ...
$ year : num NA 2020 2050 2080 2020 2050 2080 2020 2050 2080 ...
$ scenario: chr NA "B2A" "B2A" "B2A" ...
$ GCM : chr NA "cgcm2" "cgcm2" "cgcm2" ...
$ sp : chr "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" ...
This is my sessionInfo():
这是我sessionInfo():
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] parallel splines grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 Hmisc_3.12-2 Formula_1.1-1 RCurl_1.95-4.1 bitops_1.0-6 biomod2_3.0.3 pROC_1.5.4 plyr_1.8
[9] rpart_4.1-3 randomForest_4.6-7 mda_0.4-4 class_7.3-9 gbm_2.1 survival_2.37-4 nnet_7.3-7 rasterVis_0.21
[17] hexbin_1.26.2 latticeExtra_0.6-26 RColorBrewer_1.0-5 lattice_0.20-23 abind_1.4-0 raster_2.1-49 sp_1.0-13 ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] cluster_1.14.4 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 gtable_0.1.2 labeling_0.2 MASS_7.3-29 munsell_0.4.2 proto_0.3-10 scales_0.2.3
[11] stringr_0.6.2 tools_3.0.1 zoo_1.7-10
I have tried to reproduce it with a fewer number of columns (2 columns), it did not change anything. However, if I reduce the number of lines, it can work when the requested variable "Species" has only one level value:
我尝试用更少的列(2列)来复制它,它没有改变任何东西。但是,如果我减少行数,当被请求的变量“物种”只有一个级别值时,它可以工作:
> small.df <- uncertainty.long[1:3800, ]
> unique(small.df$Species)
[1] Arctosafulvolineata
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> ddply(small.df, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
But if I had another line:
但如果我有另一条线:
> small.df <- uncertainty.long[1:3801, ]
> unique(small.df$Species)
[1] Arctosafulvolineata Argyronetaaquatica
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> small.df[3800:3801, ]
Stack Variable PARun Model Species value year scenario GCM sp
3800 sync_hadcm3_A1B_2080 Lost PA5 MAXENT Arctosafulvolineata -54.90872 2080 A1B hadcm3 Arctosa\nfulvolineata
3801 sync_current Total PA1 GLM Argyronetaaquatica 100.00000 NA <NA> <NA> Argyroneta\naquatica
> ddply(small.df, .(Species), "nrow")
Error in attributes(out) <- attributes(col) :
'names' attribute [3801] must be the same length as the vector [3800]
I have found others with a similar problem : https://*.com/a/14162351/2788395.
我发现了其他类似的问题:https://*.com/a/14162351/2788395。
However, their workaround (reinstalling plyr 1.7 instead of 1.8) did not work for me. Does anyone have an idea of the problem and/or how to solve it?
然而,他们的工作(重新安装plyr 1.7,而不是1.8)对我来说并不管用。有没有人知道这个问题,或者如何解决这个问题?
Thanks!
谢谢!
Problem solved The issue was with the "names" attribute of the "Species" column. I removed them with the following code and ddply worked:
问题解决的问题是“物种”栏的“名称”属性。我用下面的代码删除了它们,并进行了工作:
> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
2 Argyronetaaquatica 3800
3 Dolomedesplantarius 3800
4 Enoplognathamordax 3800
5 Iciussubinermis 3800
6 Neonvalentulus 3800
7 Pardosabifasciata 3800
8 Pardosaoreophila 3800
9 Piratauliginosus 3800
10 Trochosaspinipalpis 3800
1 个解决方案
#1
1
The issue was with the "names" attribute of the "Species" column:
问题在于“物种”栏的“姓名”属性:
$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr "1" "1" "1" "1" ...
I removed them with the following code and ddply worked:
我用下面的代码删除了它们,并进行了工作:
> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
2 Argyronetaaquatica 3800
3 Dolomedesplantarius 3800
4 Enoplognathamordax 3800
5 Iciussubinermis 3800
6 Neonvalentulus 3800
7 Pardosabifasciata 3800
8 Pardosaoreophila 3800
9 Piratauliginosus 3800
10 Trochosaspinipalpis 3800
#1
1
The issue was with the "names" attribute of the "Species" column:
问题在于“物种”栏的“姓名”属性:
$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr "1" "1" "1" "1" ...
I removed them with the following code and ddply worked:
我用下面的代码删除了它们,并进行了工作:
> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
2 Argyronetaaquatica 3800
3 Dolomedesplantarius 3800
4 Enoplognathamordax 3800
5 Iciussubinermis 3800
6 Neonvalentulus 3800
7 Pardosabifasciata 3800
8 Pardosaoreophila 3800
9 Piratauliginosus 3800
10 Trochosaspinipalpis 3800