熊猫/numpy int64中的32位整数溢出(python 3.6)

时间:2021-03-01 21:26:29

Let me start with the example code:

让我从示例代码开始:

import numpy
from pandas import DataFrame

a = DataFrame({"nums": [2233, -23160, -43608]})

a.nums = numpy.int64(a.nums)

print(a.nums ** 2)
print((a.nums ** 2).sum())

On my local machine, and other devs' machines, this works as expected and prints out:

在我的本地机器和其他开发人员的机器上,它按预期工作并打印出来:

0       4986289
1     536385600
2    1901657664
Name: nums, dtype: int64
2443029553

However, on our production server, we get:

然而,在我们的生产服务器上,我们得到:

0       4986289
1     536385600
2    1901657664
Name: nums, dtype: int64
-1851937743

Which is 32-bit integer overflow, despite it being an int64.

它是32位整数溢出,尽管它是int64。

The production server is using the same versions of python, numpy, pandas, etc. It's a 64-bit Windows Server 2012 OS and everything reports 64-bit (e.g. python --version, sys.maxsize, plastform.architecture).

生产服务器使用的是相同版本的python、numpy、panda等。这是一个64位Windows server 2012 OS,一切都报告64位(例如python——version, sys)。最大容量,plastform.architecture)。

What could possibly be causing this?

可能是什么原因造成的呢?

1 个解决方案

#1


6  

This is a bug in the bottleneck library, which Pandas uses if it's installed. In some circumstances, bottleneck.nansum incorrectly has 32-bit overflow behavior when called on 64-bit input.

这是瓶颈库中的一个bug,熊猫会在安装时使用它。在某些情况下,瓶颈。当调用64位输入时,nansum错误地具有32位溢出行为。

I believe this is due to bottleneck using PyInt_FromLong even when long is 32-bit. I'm not sure why that even compiles, actually. There's an issue report on the bottleneck issue tracker, not yet fixed, as well as an issue report on the Pandas issue tracker, where they tried to compensate for Bottleneck's issue (but I think they turned off Bottleneck when it does work instead of when it doesn't).

我认为这是由于使用PyInt_FromLong的瓶颈,即使long是32位的。我不知道为什么会这样。有一个关于瓶颈问题跟踪器的问题报告,还没有修复,还有一个关于熊猫问题跟踪器的问题报告,他们试图弥补瓶颈问题(但我认为他们在瓶颈有效时关闭了瓶颈,而不是在瓶颈失效时)。

#1


6  

This is a bug in the bottleneck library, which Pandas uses if it's installed. In some circumstances, bottleneck.nansum incorrectly has 32-bit overflow behavior when called on 64-bit input.

这是瓶颈库中的一个bug,熊猫会在安装时使用它。在某些情况下,瓶颈。当调用64位输入时,nansum错误地具有32位溢出行为。

I believe this is due to bottleneck using PyInt_FromLong even when long is 32-bit. I'm not sure why that even compiles, actually. There's an issue report on the bottleneck issue tracker, not yet fixed, as well as an issue report on the Pandas issue tracker, where they tried to compensate for Bottleneck's issue (but I think they turned off Bottleneck when it does work instead of when it doesn't).

我认为这是由于使用PyInt_FromLong的瓶颈,即使long是32位的。我不知道为什么会这样。有一个关于瓶颈问题跟踪器的问题报告,还没有修复,还有一个关于熊猫问题跟踪器的问题报告,他们试图弥补瓶颈问题(但我认为他们在瓶颈有效时关闭了瓶颈,而不是在瓶颈失效时)。