I tried to Cythonize part of my code as following to hopefully gain some speed:
我试着将我的代码的一部分加以美化,希望能获得一些速度:
# cython: boundscheck=False
import numpy as np
cimport numpy as np
import time
cpdef object my_function(np.ndarray[np.double_t, ndim = 1] array_a,
np.ndarray[np.double_t, ndim = 1] array_b,
int n_rows,
int n_columns):
cdef double minimum_of_neighbours, difference, change
cdef int i
cdef np.ndarray[np.int_t, ndim =1] locations
locations = np.argwhere(array_a > 0)
for i in locations:
minimum_of_neighbours = min(array_a[i - n_columns], array_a[i+1], array_a[i + n_columns], array_a[i-1])
if array_a[i] - minimum_of_neighbours < 0:
difference = minimum_of_neighbours - array_a[i]
change = min(difference, array_a[i] / 5.)
array_a[i] += change
array_b[i] -= change * 5.
print time.time()
return array_a, array_b
I can compile it without an error but when I use the function I got this error:
我可以不出错地编译它,但是当我使用这个函数时,我得到了这个错误:
from cythonized_code import my_function
import numpy as np
array_a = np.random.uniform(low=-100, high=100, size = 100).astype(np.double)
array_b = np.random.uniform(low=0, high=20, size = 100).astype(np.double)
a, b = my_function(array_a,array_b,5,20)
# which gives me this error:
# locations = np.argwhere(array_a > 0)
# ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
Do I need to declare locations
type here? The reason I wanted to declare it is that it has yellow colour in the annotated HTML file generated by compiling the code.
我需要在这里声明位置类型吗?我之所以要声明它,是因为它在编译代码生成的带注释的HTML文件中呈现黄色。
1 个解决方案
#1
1
It's a good idea not to use the python-functionality locations[i]
, because it is too expensive: Python would create a full-fledged Python-integer* from the lowly c-integer (which is what is stored in the locations
-numpy array), register it in the garbage collector, then cast it back to int
, destroy the Python-object - quite an overhead.
最好不要使用python-functionality位置[我],因为它太贵了:Python将创建一个完整的从低c-integer Python-integer *(也就是存储在locations-numpy数组),注册在垃圾收集器,然后丢回int,摧毁Python对象——一个相当开销。
To get a direct access to the lowly c-integers one needs to bind locations
to a type. The normal course of action would be too look up, which properties locations
has:
要直接访问低级c-integers,我们需要将位置绑定到一种类型。通常的做法是过于查找,哪些属性位置有:
>>> locations.ndim
2
>>> locations.dtype
dtype('int64')
which translates to cdef np.ndarray[np.int64_t, ndim =2] locations
.
即cdef np.ndarray[np.int64_t, ndim =2] location。
However, this will (probably, can not check it right now) not be enough to get rid of Python-overhead because of a Cython-quirk:
然而,这将(可能现在还不能检查)不足以摆脱python -开销,因为Cython-quirk:
for i in locations:
...
will not be interpreted as a raw-array access but will invoke the Python-machinery. See for example here.
将不会被解释为原始数组访问,而是调用Python-machinery。看例子。
So you will have to change it to:
所以你必须把它改成:
for index in range(len(locations)):
i=locations[index][0]
then Cython "understands", that you want the access to the raw c-int64 array.
然后Cython“理解”,您希望访问原始c-int64数组。
- Actually, it is not completely true: In this case first an
nd.array
is created (e.g.locations[0]
orlocations[1]
) and then__Pyx_PyInt_As_int
(which is more or less an alias for[PyLong_AsLongAndOverflow][2]
) is called, which creates aPyLongObject
, from which C-int
value is obtained before the temporaryPyLongObject
andnd.array
are destructed. - 实际上,这并不是完全正确的:在这种情况下,首先是第二点。创建数组(例如,定位[0]或定位[1]),然后调用__pyx_pyint_as_as_int(它或多或少是[PyLong_AsLongAndOverflow][2]的别名),创建一个PyLongObject,在临时PyLongObject和nd之前获取C-int值。数组是损害。
Here we get lucky, because length-1 numpy-arrays can be converted to Python scalars. The code would not work if the second dimension of locations
would be >1
.
这里我们很幸运,因为length-1 numpy-array可以转换为Python标量。如果位置的第二个维度是>1,则该代码将无法工作。
#1
1
It's a good idea not to use the python-functionality locations[i]
, because it is too expensive: Python would create a full-fledged Python-integer* from the lowly c-integer (which is what is stored in the locations
-numpy array), register it in the garbage collector, then cast it back to int
, destroy the Python-object - quite an overhead.
最好不要使用python-functionality位置[我],因为它太贵了:Python将创建一个完整的从低c-integer Python-integer *(也就是存储在locations-numpy数组),注册在垃圾收集器,然后丢回int,摧毁Python对象——一个相当开销。
To get a direct access to the lowly c-integers one needs to bind locations
to a type. The normal course of action would be too look up, which properties locations
has:
要直接访问低级c-integers,我们需要将位置绑定到一种类型。通常的做法是过于查找,哪些属性位置有:
>>> locations.ndim
2
>>> locations.dtype
dtype('int64')
which translates to cdef np.ndarray[np.int64_t, ndim =2] locations
.
即cdef np.ndarray[np.int64_t, ndim =2] location。
However, this will (probably, can not check it right now) not be enough to get rid of Python-overhead because of a Cython-quirk:
然而,这将(可能现在还不能检查)不足以摆脱python -开销,因为Cython-quirk:
for i in locations:
...
will not be interpreted as a raw-array access but will invoke the Python-machinery. See for example here.
将不会被解释为原始数组访问,而是调用Python-machinery。看例子。
So you will have to change it to:
所以你必须把它改成:
for index in range(len(locations)):
i=locations[index][0]
then Cython "understands", that you want the access to the raw c-int64 array.
然后Cython“理解”,您希望访问原始c-int64数组。
- Actually, it is not completely true: In this case first an
nd.array
is created (e.g.locations[0]
orlocations[1]
) and then__Pyx_PyInt_As_int
(which is more or less an alias for[PyLong_AsLongAndOverflow][2]
) is called, which creates aPyLongObject
, from which C-int
value is obtained before the temporaryPyLongObject
andnd.array
are destructed. - 实际上,这并不是完全正确的:在这种情况下,首先是第二点。创建数组(例如,定位[0]或定位[1]),然后调用__pyx_pyint_as_as_int(它或多或少是[PyLong_AsLongAndOverflow][2]的别名),创建一个PyLongObject,在临时PyLongObject和nd之前获取C-int值。数组是损害。
Here we get lucky, because length-1 numpy-arrays can be converted to Python scalars. The code would not work if the second dimension of locations
would be >1
.
这里我们很幸运,因为length-1 numpy-array可以转换为Python标量。如果位置的第二个维度是>1,则该代码将无法工作。