将大数量存入numpy数组

I have a dataset on which I'm trying to apply some arithmetical method. The thing is it gives me relatively large numbers, and when I do it with numpy, they're stocked as 0.

我有一个数据集,我正在尝试应用一些算术方法。事情是它给了我相对较大的数字,当我用numpy做它时,它们被存储为0。

The weird thing is, when I compute the numbers appart, they have an int value, they only become zeros when I compute them using numpy.

奇怪的是,当我计算数字appart时,它们有一个int值,当我使用numpy计算它们时它们只变为零。

x = np.array([18,30,31,31,15])
10*150**x[0]/x[0]
Out[1]:36298069767006890

vector = 10*150**x/x
vector
Out[2]: array([0, 0, 0, 0, 0])

I have off course checked their types:

我当然检查了他们的类型:

type(10*150**x[0]/x[0]) == type(vector[0])
Out[3]:True

How can I compute this large numbers using numpy without seeing them turned into zeros?

如何使用numpy计算这个大数字而不将它们变成零?

Note that if we remove the factor 10 at the beggining the problem slitghly changes (but I think it might be a similar reason):

请注意,如果我们在开始时删除因子10,那么问题就会发生变化(但我认为这可能是一个类似的原因):

x = np.array([18,30,31,31,15])
150**x[0]/x[0]
Out[4]:311075541538526549

vector = 150**x/x
vector
Out[5]: array([-329406144173384851, -230584300921369396, 224960293581823801,
   -224960293581823801, -368934881474191033])

The negative numbers indicate the largest numbers of the int64 type in python as been crossed don't they?

负数表示python中的int64类型的最大数字是否已经交叉不是吗?

2 个解决方案

#1

As Nils Werner already mentioned, numpy's native ctypes cannot save numbers that large, but python itself can since the int objects use an arbitrary length implementation. So what you can do is tell numpy not to convert the numbers to ctypes but use the python objects instead. This will be slower, but it will work.

正如Nils Werner已经提到的那样,numpy的本地ctypes无法保存那么大的数字,但python本身可以,因为int对象使用任意长度的实现。所以你可以做的是告诉numpy不要将数字转换为ctypes而是使用python对象。这将会更慢,但它会起作用。

In [14]: x = np.array([18,30,31,31,15], dtype=object)

In [15]: 150**x
Out[15]: 
array([1477891880035400390625000000000000000000L,
       191751059232884086668491363525390625000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       437893890380859375000000000000000L], dtype=object)

In this case the numpy array will not store the numbers themselves but references to the corresponding int objects. When you perform arithmetic operations they won't be performed on the numpy array but on the objects behind the references.
I think you're still able to use most of the numpy functions with this workaround but they will definitely be a lot slower than usual.

在这种情况下,numpy数组不会存储数字本身,而是引用相应的int对象。执行算术运算时,它们不会在numpy数组上执行,而是在引用后面的对象上执行。我认为你仍然能够使用这种解决方法的大部分numpy功能,但它们肯定会比平时慢很多。

But that's what you get when you're dealing with numbers that large :D
Maybe somewhere out there is a library that can deal with this issue a little better.

但这就是你在处理大数字时得到的结果:D也许某个地方有一个库可以更好地处理这个问题。

Just for completeness, if precision is not an issue, you can also use floats:

只是为了完整性,如果精度不是问题,你也可以使用浮点数:

In [19]: x = np.array([18,30,31,31,15], dtype=np.float64)

In [20]: 150**x
Out[20]: 
array([  1.47789188e+39,   1.91751059e+65,   2.87626589e+67,
         2.87626589e+67,   4.37893890e+32])

#2

150 ** 28 is way beyond what an int64 variable can represent (it's in the ballpark of 8e60 while the maximum possible value of an unsigned int64 is roughly 18e18).

150 ** 28超出了int64变量可以表示的范围(它在8e60的大概中,而无符号int64的最大可能值大约是18e18)。

Python may be using an arbitrary length integer implementation, but NumPy doesn't.

Python可能正在使用任意长度的整数实现,但NumPy却没有。

As you deduced correctly, negative numbers are a symptom of an int overflow.

正确推断,负数是int溢出的症状。

#1

In [14]: x = np.array([18,30,31,31,15], dtype=object)

In [15]: 150**x
Out[15]: 
array([1477891880035400390625000000000000000000L,
       191751059232884086668491363525390625000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       437893890380859375000000000000000L], dtype=object)