空序列的算术平均值是多少?

时间:2021-02-27 22:56:49

Disclaimer: No, I didn't find any obvious answer, contrary to what I expected!

免责声明:不,我没有找到任何明显的答案,与我预期的相反!

When looking for code examples wrt. the arithmetic mean, the first several examples I can turn up via Google seem to be defined such that the empty sequence generates a mean value of 0.0. (eg. here and here ...)

当寻找代码示例时,wrt。我可以通过谷歌找到的前几个示例的算术平均值似乎被定义为这样的,即空序列生成的平均值为0.0。(如。这里和这里…)

Looking at Wikipedia however, the Arithmetic mean is defined such that an empty sequence would yield 0.0 / 0 --

但是看看*,算术平均值是这样定义的,一个空序列会产生0 / 0—

 A = 1/n ∑[i=1 -> n](a[i])

-- so, possibly, that is NaN in the general case.

一般情况下,这可能是NaN。

So if I write a utility function that calculates the arithmetic mean of a set of floating point values, should I, in the general case:

如果我写一个实用函数来计算一组浮点值的算术平均值,在一般情况下,我应该:

  • return 0. for the empty sequence?
  • 返回0。空序列?
  • return (Q)NaN for the empty sequence?
  • 返回(Q)NaN作为空序列?
  • "throw an exception" in case of empty sequence?
  • 如果序列为空,“抛出异常”?

5 个解决方案

#1


33  

There isn't an obvious answer because the handling depends on how you want to inform calling code of the error. (Or even if you want to interpret this as an "error".)

没有明显的答案,因为处理取决于您希望如何通知调用代码错误。(或者即使你想把它解释为“错误”。)

Some libraries/programs really don't like raising exceptions, so do everything with signal values. In that case, returning NaN (because the value of the expression is technically undefined) is a reasonable choice.

有些库/程序确实不喜欢引发异常,所以一切都使用信号值。在这种情况下,返回NaN(因为表达式的值在技术上没有定义)是一个合理的选择。

You might also want to return NaN if you want to "silently" bring the value forward through multiple other calculations. (Relying on the behavior that NaN combined with anything else is "silently" NaN.)

如果希望通过多个其他计算“静默”将值向前推进,您可能还想返回NaN。(依赖于NaN与其他事物结合的行为是“无声的”NaN。)

But note that if you return NaN for the mean of an empty sequence, you impose the burden on calling code that they need to check the return value of the function to make sure that it isn't NaN - either immediately upon return or later on. This is a requirement that is easy to miss, depending on how fastidious you are in checking return values.

但是请注意,如果您返回NaN作为一个空序列的平均值,那么就会增加调用代码的负担,这些代码需要检查函数的返回值,以确保它不是NaN——要么在返回时立即返回,要么稍后返回。这是一个很容易被忽略的需求,这取决于您在检查返回值时多么挑剔。

Because of this, other libraries/programs take the viewpoint that error conditions should be "noisy" - if you passed an empty sequence to a function that's finding the mean of the sequence, then you've obviously doing something majorly wrong, and it should be made abundantly clear to you that you've messed up.

正因为如此,其他库/程序错误条件的观点应该“嘈杂”——如果你传递一个空序列函数寻找序列的均值,然后想象显然你做错了,应该是已经明确说明,你搞砸了。

Of course, if exceptions can be raised, they need to handled, but you can do that at a higher level, potentially centralized at the point where it makes more sense to. Depending on your program, this may be easier or more along the lines of your standard error handling scheme than double checking return values.

当然,如果可以提出异常,则需要对其进行处理,但您可以在更高的级别上进行处理,在更有意义的地方进行集中处理。根据程序的不同,这可能比重复检查返回值更容易或更符合标准错误处理方案。

Other people would argue that your functions should be robust to the error. For maximum robustness, you probably shouldn't use either NaN or an exception - you need to choose an actual number which "makes sense" as a value for the average of an empty list.

其他人可能会争辩说,你的函数应该对错误是健壮的。为了最大程度的健壮性,您可能不应该使用NaN或异常——您需要选择一个实际的数字作为空列表平均值的“有意义”值。

Which value is going to be highly specific to your use case. For example, if your sequence is a list of differences/errors, you might to return 0. If you're averaging test scores (scored 0-100), you might want to return 100 for an empty list ... or 0, depending on what your philosophy of the "starting" score is. It all depends on what the return value is going to be used for.

哪个值对您的用例来说是高度特定的。例如,如果您的序列是一个差异/错误列表,您可能会返回0。如果你正在平均测试分数(0-100分),你可能想要返回100个空列表……或者0,取决于你的“开始”分数的哲学是什么。这完全取决于返回值的用途。

Given that the value of this "neutral" value is going to be highly variable based on exact use case, you might want to actually implement it in two functions - one general function which returns NaN or raises an exception, and another that wraps the general function and recognizes the 'error' case. This way you can have multiple versions, each with a different "default" case. -- or if this is something you're doing a lot of, you might even have the "default" value be a parameter you can pass.

鉴于这个“中性”的值值是高度变量基于准确的用例,您可能想要真正实现它在两个函数——一个通用函数返回NaN或引发一个异常,而另一个包装的一般功能和承认“错误”的情况。这样,您可以有多个版本,每个版本都有不同的“默认”情况。或者,如果你做了很多事情,你甚至可能会有“默认”值作为参数。

Again, there isn't a single answer to this question: the average of an empty sequence is undefined. How you want to handle it depends intimately on what the result of the calculation is being used for: Just display, or further calculation? Should an empty list be exceptional, or should it be handled quietly? Do you want to handle the special case at the point in time it occurs, or do you want to hoist/defer the error handling?

同样,对于这个问题没有一个单一的答案:一个空序列的平均值是没有定义的。您希望如何处理它,这与计算结果的用途密切相关:只是显示,还是进一步的计算?一个空列表应该是异常的,还是应该静静地处理?您是希望在特殊情况发生时处理它,还是希望提升/延迟错误处理?

#2


28  

Mathematically, it's undefined as the denominator is zero.

在数学上,它没有定义,因为分母是零。

Because the behaviour of integer division by zero is undefined in C++, throw an exception if you're working in integral types.

因为在c++中,整数除零的行为是没有定义的,所以在处理整数类型时抛出一个异常。

If you're working in IEEE754 floating point, then return NaN since the numerator will also be zero. (+Inf would be returned if the numerator is positive, and -Inf if the numerator is negative).

如果使用IEEE754浮点数,那么返回NaN,因为分子也是0。(如果分子为正,则返回+Inf;如果分子为负,则返回-Inf)

#3


14  

I suggest to keep the same behavior as for a 0.0 by 0 division, whatever it is. Indeed, one can adopt the as-if rule. This way you remain coherent with other operations and you don't have to make the decision yourself.

我建议保持0×0除法的行为,不管它是什么。事实上,我们可以采用假设规则。这样你就能与其他操作保持一致,而不必自己做决定。

(You could even implement it as such, by returning 0.0/0, but the compiler might optimize this in unexpected ways.)

(您甚至可以通过返回0/0来实现它,但是编译器可能会以意想不到的方式对其进行优化。)

#4


2  

I like defensive coding, so I would throw an exception. You can make it either a specific exception (like empty_sequence_exception) or a division by 0, since the divider is the length of the sequence which is 0.

我喜欢防御性编码,所以我要抛出一个异常。您可以将其设置为一个特定的异常(如empty_sequence_exception)或一个0,因为分隔符是序列的长度为0。

0.0 is debatable since there is no data (sequence).

0是有争议的,因为没有数据(序列)。

#5


-1  

The correct answer is that the arithmetic mean of an empty sequence has no meaning, since an empty sequence is essentially an empty set. Division of nothing is meaningless. Zero is certainly not a correct answer. Say a sequence has 3 members, 1, 0 and -1, or is a sequence of all zeros. The mean of both of these would be zero, and should not be confused with an empty sequence.

正确的答案是,一个空序列的算术平均值没有意义,因为一个空序列本质上是一个空集合。0肯定不是正确答案。假设一个序列有3个元素,1 0和-1,或者是所有0的序列。它们的均值都是0,不应该和空序列混淆。

#1


33  

There isn't an obvious answer because the handling depends on how you want to inform calling code of the error. (Or even if you want to interpret this as an "error".)

没有明显的答案,因为处理取决于您希望如何通知调用代码错误。(或者即使你想把它解释为“错误”。)

Some libraries/programs really don't like raising exceptions, so do everything with signal values. In that case, returning NaN (because the value of the expression is technically undefined) is a reasonable choice.

有些库/程序确实不喜欢引发异常,所以一切都使用信号值。在这种情况下,返回NaN(因为表达式的值在技术上没有定义)是一个合理的选择。

You might also want to return NaN if you want to "silently" bring the value forward through multiple other calculations. (Relying on the behavior that NaN combined with anything else is "silently" NaN.)

如果希望通过多个其他计算“静默”将值向前推进,您可能还想返回NaN。(依赖于NaN与其他事物结合的行为是“无声的”NaN。)

But note that if you return NaN for the mean of an empty sequence, you impose the burden on calling code that they need to check the return value of the function to make sure that it isn't NaN - either immediately upon return or later on. This is a requirement that is easy to miss, depending on how fastidious you are in checking return values.

但是请注意,如果您返回NaN作为一个空序列的平均值,那么就会增加调用代码的负担,这些代码需要检查函数的返回值,以确保它不是NaN——要么在返回时立即返回,要么稍后返回。这是一个很容易被忽略的需求,这取决于您在检查返回值时多么挑剔。

Because of this, other libraries/programs take the viewpoint that error conditions should be "noisy" - if you passed an empty sequence to a function that's finding the mean of the sequence, then you've obviously doing something majorly wrong, and it should be made abundantly clear to you that you've messed up.

正因为如此,其他库/程序错误条件的观点应该“嘈杂”——如果你传递一个空序列函数寻找序列的均值,然后想象显然你做错了,应该是已经明确说明,你搞砸了。

Of course, if exceptions can be raised, they need to handled, but you can do that at a higher level, potentially centralized at the point where it makes more sense to. Depending on your program, this may be easier or more along the lines of your standard error handling scheme than double checking return values.

当然,如果可以提出异常,则需要对其进行处理,但您可以在更高的级别上进行处理,在更有意义的地方进行集中处理。根据程序的不同,这可能比重复检查返回值更容易或更符合标准错误处理方案。

Other people would argue that your functions should be robust to the error. For maximum robustness, you probably shouldn't use either NaN or an exception - you need to choose an actual number which "makes sense" as a value for the average of an empty list.

其他人可能会争辩说,你的函数应该对错误是健壮的。为了最大程度的健壮性,您可能不应该使用NaN或异常——您需要选择一个实际的数字作为空列表平均值的“有意义”值。

Which value is going to be highly specific to your use case. For example, if your sequence is a list of differences/errors, you might to return 0. If you're averaging test scores (scored 0-100), you might want to return 100 for an empty list ... or 0, depending on what your philosophy of the "starting" score is. It all depends on what the return value is going to be used for.

哪个值对您的用例来说是高度特定的。例如,如果您的序列是一个差异/错误列表,您可能会返回0。如果你正在平均测试分数(0-100分),你可能想要返回100个空列表……或者0,取决于你的“开始”分数的哲学是什么。这完全取决于返回值的用途。

Given that the value of this "neutral" value is going to be highly variable based on exact use case, you might want to actually implement it in two functions - one general function which returns NaN or raises an exception, and another that wraps the general function and recognizes the 'error' case. This way you can have multiple versions, each with a different "default" case. -- or if this is something you're doing a lot of, you might even have the "default" value be a parameter you can pass.

鉴于这个“中性”的值值是高度变量基于准确的用例,您可能想要真正实现它在两个函数——一个通用函数返回NaN或引发一个异常,而另一个包装的一般功能和承认“错误”的情况。这样,您可以有多个版本,每个版本都有不同的“默认”情况。或者,如果你做了很多事情,你甚至可能会有“默认”值作为参数。

Again, there isn't a single answer to this question: the average of an empty sequence is undefined. How you want to handle it depends intimately on what the result of the calculation is being used for: Just display, or further calculation? Should an empty list be exceptional, or should it be handled quietly? Do you want to handle the special case at the point in time it occurs, or do you want to hoist/defer the error handling?

同样,对于这个问题没有一个单一的答案:一个空序列的平均值是没有定义的。您希望如何处理它,这与计算结果的用途密切相关:只是显示,还是进一步的计算?一个空列表应该是异常的,还是应该静静地处理?您是希望在特殊情况发生时处理它,还是希望提升/延迟错误处理?

#2


28  

Mathematically, it's undefined as the denominator is zero.

在数学上,它没有定义,因为分母是零。

Because the behaviour of integer division by zero is undefined in C++, throw an exception if you're working in integral types.

因为在c++中,整数除零的行为是没有定义的,所以在处理整数类型时抛出一个异常。

If you're working in IEEE754 floating point, then return NaN since the numerator will also be zero. (+Inf would be returned if the numerator is positive, and -Inf if the numerator is negative).

如果使用IEEE754浮点数,那么返回NaN,因为分子也是0。(如果分子为正,则返回+Inf;如果分子为负,则返回-Inf)

#3


14  

I suggest to keep the same behavior as for a 0.0 by 0 division, whatever it is. Indeed, one can adopt the as-if rule. This way you remain coherent with other operations and you don't have to make the decision yourself.

我建议保持0×0除法的行为,不管它是什么。事实上,我们可以采用假设规则。这样你就能与其他操作保持一致,而不必自己做决定。

(You could even implement it as such, by returning 0.0/0, but the compiler might optimize this in unexpected ways.)

(您甚至可以通过返回0/0来实现它,但是编译器可能会以意想不到的方式对其进行优化。)

#4


2  

I like defensive coding, so I would throw an exception. You can make it either a specific exception (like empty_sequence_exception) or a division by 0, since the divider is the length of the sequence which is 0.

我喜欢防御性编码,所以我要抛出一个异常。您可以将其设置为一个特定的异常(如empty_sequence_exception)或一个0,因为分隔符是序列的长度为0。

0.0 is debatable since there is no data (sequence).

0是有争议的,因为没有数据(序列)。

#5


-1  

The correct answer is that the arithmetic mean of an empty sequence has no meaning, since an empty sequence is essentially an empty set. Division of nothing is meaningless. Zero is certainly not a correct answer. Say a sequence has 3 members, 1, 0 and -1, or is a sequence of all zeros. The mean of both of these would be zero, and should not be confused with an empty sequence.

正确的答案是,一个空序列的算术平均值没有意义,因为一个空序列本质上是一个空集合。0肯定不是正确答案。假设一个序列有3个元素,1 0和-1,或者是所有0的序列。它们的均值都是0,不应该和空序列混淆。