今天看了一下,numpy数组操作其中一段代码,主要是测试用纯python和numpy之间的性能问题
在py2环境下,代码如下:
def pysum(n):
a = range(n)
b = range(n)
c = []
i = 0
for i in list(range(len(a))):
a[i] = i ** 2
b[i] = i ** 3
c.append(a[i] + b[i])
return c
c = pysum(10)
py3下报错,如下
'range' object does not support item assignment
可以看出,a = range(n)实际为range(0, n) 为range object,而非列表数组,需要将a转换成列表,a = list(range(n)),例 a = list(range(10))为 a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
将代码转换成
def pysum(n):
a = list(range(n))
b = list(range(n))
c = []
for i in list(range(len(a))):
a[i] = i ** 2
b[i] = i ** 3
c.append(a[i] + b[i])
return a, b, c #同时输出a, b, c看看结果
pysum(10)
结果为
Out[42]:
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81],
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729],
[0, 2, 12, 36, 80, 150, 252, 392, 576, 810])
用numpy实现
import numpy as np
def npsum(n):
a = np.arange(n) ** 2
b = np.arange(n) ** 3
c = a + b
return c
npsum(10)
比较一下两种实现方式的效率
#效率比较 from datetime import datetime size = 1000 start = datetime.now()
c = pysum(size)
delta = datetime.now() - start
print("The last 2 elements of the sum", c[-2:])
print("PythonSum elapsed time in microseconds", delta.microseconds) start = datetime.now()
c = npsum(size)
delta = datetime.now() - start
print("The last 2 elements of the sum", c[-2:])
print("NumPySum elapsed time in microseconds", delta.microseconds)
输出结果
#用pysum()输出,打印结果如下
The last 2 elements of the sum [995007996, 998001000]
PySum elapsed time in microseconds 2000 #用npsum()输出,打印结果如下
The last 2 elements of the sum [995007996 998001000]
NPSum elapsed time in microseconds 0
所以说用npsum执行的效率远远高于用纯python写出的效率,这在数据分析里面非常重要,特别是在机器学习特别耗计算资源的情况下