I'm usually using reshape
package to aggregate some data (d'uh), usually with plyr
, because of its uber-awesome function each
. Recently, I received a suggestion to switch to reshape2
and try it out, and now I can't seem to use each
wizardry anymore.
我通常使用reshape包来聚合一些数据(呃),通常用plyr,因为每个都有超级棒的功能。最近,我收到了一个建议,切换到reshape2并尝试一下,现在我似乎无法再使用每个魔法。
reshape
> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> cast(m, am + vs ~ variable, each(min, max, mean, sd))
am vs hp_min hp_max hp_mean hp_sd
1 0 0 150 245 194.16667 33.35984
2 0 1 62 123 102.14286 20.93186
3 1 0 91 335 180.83333 98.81582
4 1 1 52 113 80.57143 24.14441
reshape2
require(plyr)
> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> dcast(m, am + vs ~ variable, each(min, max, mean, sd))
Error in structure(ordered, dim = ns) :
dims [product 4] do not match the length of object [16]
In addition: Warning messages:
1: In fs[[i]](x, ...) : no non-missing arguments to min; returning Inf
2: In fs[[i]](x, ...) : no non-missing arguments to max; returning -Inf
I wasn't into mood to comb this down, as my previous code works like a charm with reshape
, but I'd really like to know:
我没有心情去梳理它,因为我之前的代码就像一个重塑的魅力,但我真的很想知道:
- is it possible to use
each
withdcast
? - 是否有可能与dcast一起使用?
- is it advisable to use
reshape2
at all? isreshape
deprecated? - 是否建议使用reshape2?重塑已弃用?
1 个解决方案
#1
5
The answer to your first question appears to be no. Quoting from ?reshape2:::dcast
:
你的第一个问题的答案似乎是否定的。引自?reshape2 ::: dcast:
If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate. This function should take a vector of numbers and return a single summary statistic.
如果您提供的变量组合不能唯一标识原始数据集中的一行,则需要提供聚合函数fun.aggregate。此函数应采用数字向量并返回单个摘要统计信息。
A look at Hadley's github page for reshape2 suggests that he knows this functionality was removed, but seems to think it's better done in plyr, presumably with something like:
看看Hadley的reshape2的github页面表明他知道这个功能被删除了,但似乎认为在plyr中做得更好,大概是这样的:
ddply(m,.(am,vs),summarise,min = min(value),
max = max(value),
mean = mean(value),
sd = sd(value))
or if you really want to keep using each
:
或者如果你真的想继续使用每个:
ddply(m,.(am,vs),function(x){each(min,max,mean,sd)(x$value)})
#1
5
The answer to your first question appears to be no. Quoting from ?reshape2:::dcast
:
你的第一个问题的答案似乎是否定的。引自?reshape2 ::: dcast:
If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate. This function should take a vector of numbers and return a single summary statistic.
如果您提供的变量组合不能唯一标识原始数据集中的一行,则需要提供聚合函数fun.aggregate。此函数应采用数字向量并返回单个摘要统计信息。
A look at Hadley's github page for reshape2 suggests that he knows this functionality was removed, but seems to think it's better done in plyr, presumably with something like:
看看Hadley的reshape2的github页面表明他知道这个功能被删除了,但似乎认为在plyr中做得更好,大概是这样的:
ddply(m,.(am,vs),summarise,min = min(value),
max = max(value),
mean = mean(value),
sd = sd(value))
or if you really want to keep using each
:
或者如果你真的想继续使用每个:
ddply(m,.(am,vs),function(x){each(min,max,mean,sd)(x$value)})