I'm using the Statistics::Descriptive library in Perl to calculate frequency distributions and coming up against a floating point rounding error problem.
我正在使用Perl中的Statistics :: Descriptive库来计算频率分布并遇到浮点舍入错误问题。
I pass in two values, 0.205 and 0.205, (taken from other numbers and sprintf'd to those) to the stats module and ask it to calculate the frequency distribution but it's getting stuck in an infinite loop.
我传递了两个值,0.205和0.205(取自其他数字并sprintf'd到那些)到stats模块并要求它计算频率分布,但它陷入无限循环。
Stepping through with a debugger I can see that it's doing:
单步执行调试器,我可以看到它正在执行:
my $interval = $self->{sample_range}/$partitions;
my $iter = $self->{min};
while (($iter += $interval) < $self->{max}) {
$bins{$iter} = 0;
push @k, $iter; ##Keep the "keys" unstringified
}
$self->sample_range (The range is max-min)is returning 2.77555756156289e-17 rather than 0 as I'd expect. This means that the loop ((min+=range) < max)) enters a (for all intents and purposes) infinite loop.
$ self-> sample_range(范围是max-min)返回2.77555756156289e-17而不是0,正如我所料。这意味着循环((min + = range)
DB<8> print $self->{max};
0.205
DB<9> print $self->{min};
0.205
DB<10> print $self->{max}-$self->{min};
2.77555756156289e-17
DB <8> print $ self - > {max}; 0.205 DB <9> print $ self - > {min}; 0.205 DB <10> print $ self - > {max} - $ self - > {min}; 2.77555756156289e-17
So this looks like a rounding problem. I can't think how to fix this on my side though, and I'm not sure editing the library is a good idea. I'm looking for suggestions of a workaround or alternative.
所以这看起来像一个舍入问题。我想不出如何解决这个问题,我不确定编辑库是个好主意。我正在寻找变通方法或替代方案的建议。
Cheers, Neil
3 个解决方案
#1
I am the Statistics::Descriptive maintainer. Due to its numeric nature, many rounding problems have been reported. I believe this particular one was fixed in a later version to the one you were using that I released recently, by using multiplication for the divisions instead of +=.
我是Statistics :: Descriptive的维护者。由于其数字性质,已经报道了许多舍入问题。我相信这个特别的版本在后来的版本中修复了你最近发布的那个,通过使用乘法而不是+ =。
Please use the most up-to-date version from the CPAN, and it should be better.
请使用CPAN中最新的版本,它应该更好。
#2
Not exactly a rounding problem; you can see the more precise values with something like
不完全是一个舍入问题;你可以用类似的东西看到更精确的值
printf("%.18g %.18g", $self->{max}, $self->{min});
Looks to me like there's a flaw in the module where it assumes the sample range can be divided up into $partitions pieces; because floating point doesn't have infinite precision, this isn't always possible. In your case, the min and max values are exactly adjacent representable values, so there can't be more than one partition. I don't know what exactly the module is using the partitions for, so I'm not sure what the impact of this may be. Another possible problem in the module is that it is using numbers as hash keys, which implicitly stringifies them which slightly rounds the value.
在我看来,模块中有一个缺陷,它假设样品范围可以分成$ partition碎片;因为浮点没有无限精度,所以这并不总是可行的。在您的情况下,最小值和最大值是完全相邻的可表示值,因此不能有多个分区。我不知道模块究竟使用了什么分区,所以我不确定这可能是什么影响。该模块中的另一个可能的问题是它使用数字作为散列键,它隐式地将它们字符串化,略微舍入该值。
You may have some success in laundering your data through stringization before feeding it to the module:
在将数据提供给模块之前,您可能会通过字符串化来清洗数据:
$data = 0+"$data";
This will at least ensure that two numbers that (with the default printing precision) appear equal are actually equal.
这至少可以确保两个(默认打印精度)看起来相等的数字实际上是相等的。
#3
That shouldn't cause an infinite loop. What would cause that loop to be infinite would be if $self->{sample_range}/$partitions
is 0.
这不应该导致无限循环。如果$ self - > {sample_range} / $ partitions为0,那么导致该循环无限的原因是什么。
#1
I am the Statistics::Descriptive maintainer. Due to its numeric nature, many rounding problems have been reported. I believe this particular one was fixed in a later version to the one you were using that I released recently, by using multiplication for the divisions instead of +=.
我是Statistics :: Descriptive的维护者。由于其数字性质,已经报道了许多舍入问题。我相信这个特别的版本在后来的版本中修复了你最近发布的那个,通过使用乘法而不是+ =。
Please use the most up-to-date version from the CPAN, and it should be better.
请使用CPAN中最新的版本,它应该更好。
#2
Not exactly a rounding problem; you can see the more precise values with something like
不完全是一个舍入问题;你可以用类似的东西看到更精确的值
printf("%.18g %.18g", $self->{max}, $self->{min});
Looks to me like there's a flaw in the module where it assumes the sample range can be divided up into $partitions pieces; because floating point doesn't have infinite precision, this isn't always possible. In your case, the min and max values are exactly adjacent representable values, so there can't be more than one partition. I don't know what exactly the module is using the partitions for, so I'm not sure what the impact of this may be. Another possible problem in the module is that it is using numbers as hash keys, which implicitly stringifies them which slightly rounds the value.
在我看来,模块中有一个缺陷,它假设样品范围可以分成$ partition碎片;因为浮点没有无限精度,所以这并不总是可行的。在您的情况下,最小值和最大值是完全相邻的可表示值,因此不能有多个分区。我不知道模块究竟使用了什么分区,所以我不确定这可能是什么影响。该模块中的另一个可能的问题是它使用数字作为散列键,它隐式地将它们字符串化,略微舍入该值。
You may have some success in laundering your data through stringization before feeding it to the module:
在将数据提供给模块之前,您可能会通过字符串化来清洗数据:
$data = 0+"$data";
This will at least ensure that two numbers that (with the default printing precision) appear equal are actually equal.
这至少可以确保两个(默认打印精度)看起来相等的数字实际上是相等的。
#3
That shouldn't cause an infinite loop. What would cause that loop to be infinite would be if $self->{sample_range}/$partitions
is 0.
这不应该导致无限循环。如果$ self - > {sample_range} / $ partitions为0,那么导致该循环无限的原因是什么。