Cython:如何声明numpy.argwhere()

时间:2021-12-12 21:21:50

I tried to Cythonize part of my code as following to hopefully gain some speed:

我试着将我的代码的一部分加以美化,希望能获得一些速度:

# cython: boundscheck=False
import numpy as np
cimport numpy as np
import time

cpdef object my_function(np.ndarray[np.double_t, ndim = 1] array_a,
                     np.ndarray[np.double_t, ndim = 1] array_b,
                     int n_rows,
                     int n_columns):
    cdef double minimum_of_neighbours, difference, change
    cdef int i
    cdef np.ndarray[np.int_t, ndim =1] locations
    locations = np.argwhere(array_a > 0)

    for i in locations:
        minimum_of_neighbours = min(array_a[i - n_columns], array_a[i+1], array_a[i + n_columns], array_a[i-1])
        if array_a[i] - minimum_of_neighbours < 0:
            difference = minimum_of_neighbours - array_a[i]
            change = min(difference, array_a[i] / 5.)
            array_a[i] += change
            array_b[i] -= change * 5.
        print time.time()

return array_a, array_b

I can compile it without an error but when I use the function I got this error:

我可以不出错地编译它,但是当我使用这个函数时,我得到了这个错误:

from cythonized_code import my_function
import numpy as np

array_a = np.random.uniform(low=-100, high=100, size = 100).astype(np.double)
array_b = np.random.uniform(low=0, high=20, size = 100).astype(np.double)

a, b = my_function(array_a,array_b,5,20)

# which gives me this error:    
# locations = np.argwhere(array_a > 0)
# ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Do I need to declare locations type here? The reason I wanted to declare it is that it has yellow colour in the annotated HTML file generated by compiling the code.

我需要在这里声明位置类型吗?我之所以要声明它,是因为它在编译代码生成的带注释的HTML文件中呈现黄色。

1 个解决方案

#1


1  

It's a good idea not to use the python-functionality locations[i], because it is too expensive: Python would create a full-fledged Python-integer* from the lowly c-integer (which is what is stored in the locations-numpy array), register it in the garbage collector, then cast it back to int, destroy the Python-object - quite an overhead.

最好不要使用python-functionality位置[我],因为它太贵了:Python将创建一个完整的从低c-integer Python-integer *(也就是存储在locations-numpy数组),注册在垃圾收集器,然后丢回int,摧毁Python对象——一个相当开销。

To get a direct access to the lowly c-integers one needs to bind locations to a type. The normal course of action would be too look up, which properties locations has:

要直接访问低级c-integers,我们需要将位置绑定到一种类型。通常的做法是过于查找,哪些属性位置有:

>>> locations.ndim
2
>>> locations.dtype
dtype('int64')

which translates to cdef np.ndarray[np.int64_t, ndim =2] locations.

即cdef np.ndarray[np.int64_t, ndim =2] location。

However, this will (probably, can not check it right now) not be enough to get rid of Python-overhead because of a Cython-quirk:

然而,这将(可能现在还不能检查)不足以摆脱python -开销,因为Cython-quirk:

for i in locations:
    ...

will not be interpreted as a raw-array access but will invoke the Python-machinery. See for example here.

将不会被解释为原始数组访问,而是调用Python-machinery。看例子。

So you will have to change it to:

所以你必须把它改成:

for index in range(len(locations)):
      i=locations[index][0]

then Cython "understands", that you want the access to the raw c-int64 array.

然后Cython“理解”,您希望访问原始c-int64数组。


  • Actually, it is not completely true: In this case first an nd.array is created (e.g. locations[0] or locations[1]) and then __Pyx_PyInt_As_int (which is more or less an alias for [PyLong_AsLongAndOverflow][2]) is called, which creates a PyLongObject, from which C-int value is obtained before the temporary PyLongObject and nd.array are destructed.
  • 实际上,这并不是完全正确的:在这种情况下,首先是第二点。创建数组(例如,定位[0]或定位[1]),然后调用__pyx_pyint_as_as_int(它或多或少是[PyLong_AsLongAndOverflow][2]的别名),创建一个PyLongObject,在临时PyLongObject和nd之前获取C-int值。数组是损害。

Here we get lucky, because length-1 numpy-arrays can be converted to Python scalars. The code would not work if the second dimension of locations would be >1.

这里我们很幸运,因为length-1 numpy-array可以转换为Python标量。如果位置的第二个维度是>1,则该代码将无法工作。

#1


1  

It's a good idea not to use the python-functionality locations[i], because it is too expensive: Python would create a full-fledged Python-integer* from the lowly c-integer (which is what is stored in the locations-numpy array), register it in the garbage collector, then cast it back to int, destroy the Python-object - quite an overhead.

最好不要使用python-functionality位置[我],因为它太贵了:Python将创建一个完整的从低c-integer Python-integer *(也就是存储在locations-numpy数组),注册在垃圾收集器,然后丢回int,摧毁Python对象——一个相当开销。

To get a direct access to the lowly c-integers one needs to bind locations to a type. The normal course of action would be too look up, which properties locations has:

要直接访问低级c-integers,我们需要将位置绑定到一种类型。通常的做法是过于查找,哪些属性位置有:

>>> locations.ndim
2
>>> locations.dtype
dtype('int64')

which translates to cdef np.ndarray[np.int64_t, ndim =2] locations.

即cdef np.ndarray[np.int64_t, ndim =2] location。

However, this will (probably, can not check it right now) not be enough to get rid of Python-overhead because of a Cython-quirk:

然而,这将(可能现在还不能检查)不足以摆脱python -开销,因为Cython-quirk:

for i in locations:
    ...

will not be interpreted as a raw-array access but will invoke the Python-machinery. See for example here.

将不会被解释为原始数组访问,而是调用Python-machinery。看例子。

So you will have to change it to:

所以你必须把它改成:

for index in range(len(locations)):
      i=locations[index][0]

then Cython "understands", that you want the access to the raw c-int64 array.

然后Cython“理解”,您希望访问原始c-int64数组。


  • Actually, it is not completely true: In this case first an nd.array is created (e.g. locations[0] or locations[1]) and then __Pyx_PyInt_As_int (which is more or less an alias for [PyLong_AsLongAndOverflow][2]) is called, which creates a PyLongObject, from which C-int value is obtained before the temporary PyLongObject and nd.array are destructed.
  • 实际上,这并不是完全正确的:在这种情况下,首先是第二点。创建数组(例如,定位[0]或定位[1]),然后调用__pyx_pyint_as_as_int(它或多或少是[PyLong_AsLongAndOverflow][2]的别名),创建一个PyLongObject,在临时PyLongObject和nd之前获取C-int值。数组是损害。

Here we get lucky, because length-1 numpy-arrays can be converted to Python scalars. The code would not work if the second dimension of locations would be >1.

这里我们很幸运,因为length-1 numpy-array可以转换为Python标量。如果位置的第二个维度是>1,则该代码将无法工作。