I'm learning to use Numpy and I wanted to see the speed difference in the summation of a list of numbers so I made this code:
我正在学习使用Numpy,我想在数字列表的总和中看到速度差异所以我制作了这段代码:
np_array = numpy.arange(1000000)
start = time.time()
sum_ = np_array.sum()
print time.time() - start, sum_
>>> 0.0 1783293664
python_list = range(1000000)
start = time.time()
sum_ = sum(python_list)
print time.time() - start, sum_
>>> 0.390000104904 499999500000
The python_list sum is correct.
python_list总和是正确的。
If I do the same code with the summation to 1000, both print the right answer. Is there an upper limit to the length of the Numpy array or is it with the Numpy sum function?
如果我使用总和为1000的相同代码,则打印正确的答案。 Numpy数组的长度是上限还是Numpy sum函数?
Thanks for your help
谢谢你的帮助
3 个解决方案
#1
9
Numpy is creating an array of 32-bit unsigned ints. When it sums them, it sums them into a 32-bit value.
Numpy正在创建一个32位无符号整数数组。当它们相加时,它将它们相加为32位值。
if 499999500000L % (2**32) == 1783293664L:
print "Overflowed a 32-bit integer"
You can explicitly choose the data type at array creation time:
您可以在阵列创建时明确选择数据类型:
a = numpy.arange(1000000, dtype=numpy.uint64)
a.sum() -> 499999500000
#2
9
The standard list switched over to doing arithmetic with the long type when numbers got larger than a 32-bit int.
当数字大于32位int时,标准列表切换到使用long类型进行算术运算。
The numpy array did not switch to long, and suffered from integer overflow. The price for speed is smaller range of values allowed.
numpy数组没有切换到long,并且遭遇整数溢出。速度的价格是允许的较小值范围。
>>> 499999500000 % 2**32
1783293664L
#3
6
Notice that 499999500000 % 2**32
equals exactly 1783293664 ... i.e., numpy is doing operations modulo 2**32, because that's the type of the numpy.array you've told it to use.
请注意,499999500000%2 ** 32完全等于1783293664 ...即,numpy正在进行模2 * 32的操作,因为这是你告诉它使用的numpy.array的类型。
Make np_array = numpy.arange(1000000, dtype=numpy.uint64)
, for example, and your sum will come out OK (although of course there are still limits, with any finite-size number type).
例如,使np_array = numpy.arange(1000000,dtype = numpy.uint64),你的总和就会好(尽管当然还有限制,任何有限大小的数字类型)。
You can use dtype=numpy.object
to tell numpy that the array holds generic Python objects; of course, performance will decay as generality increases.
你可以使用dtype = numpy.object告诉numpy该数组包含通用的Python对象;当然,随着普遍性的增加,表现会衰退。
#1
9
Numpy is creating an array of 32-bit unsigned ints. When it sums them, it sums them into a 32-bit value.
Numpy正在创建一个32位无符号整数数组。当它们相加时,它将它们相加为32位值。
if 499999500000L % (2**32) == 1783293664L:
print "Overflowed a 32-bit integer"
You can explicitly choose the data type at array creation time:
您可以在阵列创建时明确选择数据类型:
a = numpy.arange(1000000, dtype=numpy.uint64)
a.sum() -> 499999500000
#2
9
The standard list switched over to doing arithmetic with the long type when numbers got larger than a 32-bit int.
当数字大于32位int时,标准列表切换到使用long类型进行算术运算。
The numpy array did not switch to long, and suffered from integer overflow. The price for speed is smaller range of values allowed.
numpy数组没有切换到long,并且遭遇整数溢出。速度的价格是允许的较小值范围。
>>> 499999500000 % 2**32
1783293664L
#3
6
Notice that 499999500000 % 2**32
equals exactly 1783293664 ... i.e., numpy is doing operations modulo 2**32, because that's the type of the numpy.array you've told it to use.
请注意,499999500000%2 ** 32完全等于1783293664 ...即,numpy正在进行模2 * 32的操作,因为这是你告诉它使用的numpy.array的类型。
Make np_array = numpy.arange(1000000, dtype=numpy.uint64)
, for example, and your sum will come out OK (although of course there are still limits, with any finite-size number type).
例如,使np_array = numpy.arange(1000000,dtype = numpy.uint64),你的总和就会好(尽管当然还有限制,任何有限大小的数字类型)。
You can use dtype=numpy.object
to tell numpy that the array holds generic Python objects; of course, performance will decay as generality increases.
你可以使用dtype = numpy.object告诉numpy该数组包含通用的Python对象;当然,随着普遍性的增加,表现会衰退。