dcast错误:'遗漏聚合函数:默认为长度'

时间:2021-08-31 16:18:24

My df looks like this:

我的df看起来像这样:

Id  Task Type    Freq  
3     1    A       2
3     1    B       3
3     2    A       3
3     2    B       0
4     1    A       3
4     1    B       3
4     2    A       1
4     2    B       3

I want to restructure by Id and get:

我想通过Id进行重组并得到:

Id   A    B …  Z    
3    5    3      
4    4    6        

I tried:

df_wide <- dcast(df, Id + Task ~ Type, value.var="Freq")

and got the error:

并得到错误:

Aggregation function missing: defaulting to length

聚合函数丢失:默认为长度

I can't figure out what to put in the fun.aggregate. What's the problem?

我无法弄清楚要放在fun.aggregate中的内容。有什么问题?

1 个解决方案

#1


9  

The reason why you are getting this warning is in the description of fun.aggregate (see ?dcast):

您收到此警告的原因在于fun.aggregate的说明(请参阅?dcast):

aggregation function needed if variables do not identify a single observation for each output cell. Defaults to length (with a message) if needed but not specified

如果变量没有为每个输出单元识别单个观察,则需要聚合函数。如果需要,默认为长度(带消息)但未指定

So, an aggregation function is needed when there is more than one value for one spot in the wide dataframe.

因此,当宽数据帧中的一个点有多个值时,需要聚合函数。

An explanation based on your data:

基于您的数据的解释:

When you use dcast(df, Id + Task ~ Type, value.var="Freq") you get:

当你使用dcast(df,Id + Task~Type,value.var =“Freq”)时,你得到:

  Id Task A B
1  3    1 2 3
2  3    2 3 0
3  4    1 3 3
4  4    2 1 3

Which is logical because for each combination of Id, Task and Type there is only value in Freq. But when you use dcast(df, Id ~ Type, value.var="Freq") you get this (including a warning message):

这是合乎逻辑的,因为对于Id,Task和Type的每个组合,Freq中只有值。但是当你使用dcast(df,Id~Type,value.var =“Freq”)时你会得到这个(包括一条警告信息):

Aggregation function missing: defaulting to length
  Id A B
1  3 2 2
2  4 2 2

Now, looking back at the top part of your data:

现在,回顾一下数据的顶部:

Id  Task Type    Freq  
3     1    A       2
3     1    B       3
3     2    A       3
3     2    B       0

You see why this is the case. For each combination of Id and Type there are two values in Freq (for Id 3: 2 and 3 for A & 3 and 0 for Type B) while you can only put one value in this spot in the wide dataframe for each values of type. Therefore dcast wants to aggregate these values into one value. The default aggregation function is length, but you can use other aggregation functions like sum, mean, sd or a custom function by specifying them with fun.aggregate.

你明白为什么会这样。对于Id和Type的每个组合,Freq中有两个值(对于Id 3:2,对于A和3为3,对于B类为0),而对于每个类型的值,您只能在宽数据帧中的这个点中放置一个值。因此dcast想要将这些值聚合成一个值。默认聚合函数是length,但您可以使用fun.aggregate指定其他聚合函数,如sum,mean,sd或自定义函数。

For example, with fun.aggregate = sum you get:

例如,使用fun.aggregate = sum得到:

  Id A B
1  3 5 3
2  4 4 6

Now there is no warning because dcast is being told what to do when there is more than one value: return the sum of the values.

现在没有警告,因为当有多个值时dcast被告知要做什么:返回值的总和。

#1


9  

The reason why you are getting this warning is in the description of fun.aggregate (see ?dcast):

您收到此警告的原因在于fun.aggregate的说明(请参阅?dcast):

aggregation function needed if variables do not identify a single observation for each output cell. Defaults to length (with a message) if needed but not specified

如果变量没有为每个输出单元识别单个观察,则需要聚合函数。如果需要,默认为长度(带消息)但未指定

So, an aggregation function is needed when there is more than one value for one spot in the wide dataframe.

因此,当宽数据帧中的一个点有多个值时,需要聚合函数。

An explanation based on your data:

基于您的数据的解释:

When you use dcast(df, Id + Task ~ Type, value.var="Freq") you get:

当你使用dcast(df,Id + Task~Type,value.var =“Freq”)时,你得到:

  Id Task A B
1  3    1 2 3
2  3    2 3 0
3  4    1 3 3
4  4    2 1 3

Which is logical because for each combination of Id, Task and Type there is only value in Freq. But when you use dcast(df, Id ~ Type, value.var="Freq") you get this (including a warning message):

这是合乎逻辑的,因为对于Id,Task和Type的每个组合,Freq中只有值。但是当你使用dcast(df,Id~Type,value.var =“Freq”)时你会得到这个(包括一条警告信息):

Aggregation function missing: defaulting to length
  Id A B
1  3 2 2
2  4 2 2

Now, looking back at the top part of your data:

现在,回顾一下数据的顶部:

Id  Task Type    Freq  
3     1    A       2
3     1    B       3
3     2    A       3
3     2    B       0

You see why this is the case. For each combination of Id and Type there are two values in Freq (for Id 3: 2 and 3 for A & 3 and 0 for Type B) while you can only put one value in this spot in the wide dataframe for each values of type. Therefore dcast wants to aggregate these values into one value. The default aggregation function is length, but you can use other aggregation functions like sum, mean, sd or a custom function by specifying them with fun.aggregate.

你明白为什么会这样。对于Id和Type的每个组合,Freq中有两个值(对于Id 3:2,对于A和3为3,对于B类为0),而对于每个类型的值,您只能在宽数据帧中的这个点中放置一个值。因此dcast想要将这些值聚合成一个值。默认聚合函数是length,但您可以使用fun.aggregate指定其他聚合函数,如sum,mean,sd或自定义函数。

For example, with fun.aggregate = sum you get:

例如,使用fun.aggregate = sum得到:

  Id A B
1  3 5 3
2  4 4 6

Now there is no warning because dcast is being told what to do when there is more than one value: return the sum of the values.

现在没有警告,因为当有多个值时dcast被告知要做什么:返回值的总和。