numpy中列表滑动的矢量化实现

时间:2022-02-13 03:33:55

from given numpy array [1,2,3,4] and window wz=2 (two elements before and two elements after every element) I have to get pairs (central el, el from window). Pairs with unexisting elements could be skipped or substituted by zero. So on this example I have to get this:

从给定的numpy数组[1,2,3,4]和窗口wz = 2(前两个元素和每个元素后面的两个元素)我必须得到对(中心el,el来自窗口)。具有未存在元素的对可以被跳过或替换为零。所以在这个例子中我必须得到这个:

[[1., 0.]
 [2., 1.]
 [3., 2.]
 [4., 3.]
 [1., 2.]
 [2., 3.]
 [3., 4.]
 [4., 0.]
 [1., 0.]
 [2., 0.]
 [3., 1.]
 [4., 2.]
 [1., 3.]
 [2., 4.]
 [3., 0.]
 [4., 0.]]

My implementation is extremely unefficient and looks like:

我的实现非常低效,看起来像:

x = np.array([1,2,3,4])
l = x.shape[0]
for i in range(1, m):
    init = np.empty((x.shape[0]*2,2))
    init[:,0] = np.append(x, x)
    init[:l,1] = np.pad(x, (i,0), mode='constant')[:l]
    init[-l:,1] = np.pad(x, (0,i), mode='constant')[-l:]
    corpus.extend(init)

Could someone help with much more efficient solution? On another simple test data and variants I've implemented I've got:

有人可以帮助提高效率吗?在另一个我已实现的简单测试数据和变体上,我得到了:

285 µs ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
379 µs ± 7.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

3 个解决方案

#1


1  

In case x is some data, like words or random values and we need to recombine it we could use reindexing mechanism in numpy.

如果x是某些数据,如单词或随机值,我们需要重新组合它,我们可以在numpy中使用重建索引机制。

Substituted by zero version

由零版本替代

x = np.array([1,2,3,4])
wz = 2
zero = 0

Lets build indexing matrix.

让我们构建索引矩阵。

ri = np.arange(-wz,wz+1)+np.arange(x.shape[0]).reshape(-1,1)
print(ri) 

Output:

  [[-2, -1,  0,  1,  2],
   [-1,  0,  1,  2,  3],
   [ 0,  1,  2,  3,  4],
   [ 1,  2,  3,  4,  5]

Now if we add zero to x as last element we can replace wrong indexes with it index.

现在,如果我们将x作为最后一个元素添加零,我们可以用它索引替换错误的索引。

np.place(ri,(ri<0)|(ri>x.shape[0]),x.shape[0]) #replace wrong indexes
np.vstack((
    np.hstack((x,[zero]))[ri].reshape(1,-1),#extending x with zero and reindexing 
    np.tile(x,2*wz+1)) #repeating basic `x` to each window position
    )#.T #uncomment .T to make it vertical   

Output:

 ([[0, 0, 1, 2, 3, 0, 1, 2, 3, 4, 1, 2, 3, 4, 0, 2, 3, 4, 0, 0],
   [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]])

Skipped version

The same idea, but in a slightly different order: produce a complete indexing matrix [window_index,x_index] then exclude the wrong pairs, and finally re-indexing 'x'.

相同的想法,但顺序略有不同:产生一个完整的索引矩阵[window_index,x_index]然后排除错误的对,最后重新索引'x'。

x = np.array([1,2,3,4])
wz = 2
ri = np.vstack((
    (np.arange(-wz,wz+1)+np.arange(x.shape[0]).reshape(-1,1)).ravel(),#same index matrix flaten 
    np.tile(np.arange(x.shape[0]),2*wz+1) #repeating `x` indexes to each window position
    )) 
x[ri[:,(ri[0]>=0)&(ri[0]<x.shape[0])]]#.T #uncomment .T to make it vertical   

Output:

 [[1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4],
  [3, 4, 1, 3, 4, 1, 2, 3, 4, 1, 2, 4, 1, 2]]

Update 1 (error fix) exclude zero from window to avoid pair duplication.

更新1(错误修复)从窗口中排除零以避免对重复。

x = np.array([1,2,3,4])
wz = 2
ri = np.vstack(((
        np.hstack(( np.arange(-wz,0), #remove zero from window
                    np.arange(1,wz+1)))+
        np.arange(x.shape[0]).reshape(-1,1)).ravel(), #same index matrix flaten 
    np.tile(np.arange(x.shape[0]),2*wz) #repeating `x` indexes to each window position
    )) 
x[ri[:,(ri[0]>=0)&(ri[0]<x.shape[0])]]#.T #uncomment .T to make it vertical   

Output:

  [[2, 3, 1, 3, 4, 1, 2, 4, 2, 3],
   [3, 4, 2, 3, 4, 1, 2, 3, 1, 2]]

Check documentation on used functions np.arange, np.reshape, np.place, np.hstack, broadcasting rules and indexing.

检查有关已使用函数np.arange,np.reshape,np.place,np.hstack,广播规则和索引的文档。

#2


2  

Here is a Numpythonic approach:

这是一个Numpythonic方法:

In [23]: a = np.array([1,2,3,4])
In [24]: arr = np.hstack((a-1, a+1, a - 2, a+ 2))
In [25]: mask = ~np.in1d(arr, a)
In [26]: arr[mask] = 0
In [27]: np.column_stack((np.tile(a, 4), arr))
Out[27]: 
array([ [1, 0],
        [2, 1],
        [3, 2],
        [4, 3],
        [1, 2],
        [2, 3],
        [3, 4],
        [4, 0],
        [1, 0],
        [2, 0],
        [3, 1],
        [4, 2],
        [1, 3],
        [2, 4],
        [3, 0],
        [4, 0]])

#3


0  

The numpy approach is favorable, but here is a functional approach for those interested:

numpy方法是有利的,但对于那些感兴趣的人来说,这是一种功能性的方法:

Given

import functools as ft


# Helper function
def curry(f):
    @ft.wraps(f)
    def wrapped(arg):
        try:
            return f(arg)
        except TypeError:
            return curry(ft.wraps(f)(ft.partial(f, arg)))
    return wrapped

Code

lst = [1, 2, 3, 4]
c = curry(lambda x, y: x + y)
funcs = [c(-1), c(1), c(-2), c(2)]
set_ = set(lst)


[[x, 0] if fn(x) not in set_ else [x, fn(x)] for fn in funcs for x in lst]

Output

[[1, 0],
 [2, 1],
 [3, 2],
 [4, 3],
 [1, 2],
 [2, 3],
 [3, 4],
 [4, 0],
 [1, 0],
 [2, 0],
 [3, 1],
 [4, 2],
 [1, 3],
 [2, 4],
 [3, 0],
 [4, 0]]

Details

In the double for loops of the list comprehension, a list of curried functions is iterated, and each function is applied to every element of the primary list (lst). Currying allows you to compute new values by passing in some argument (e.g. 1, -1, -2, 2) and later passing in an element from the primary list.

在列表推导的双for循环中,迭代curried函数列表,并且每个函数应用于主列表(lst)的每个元素。 Currying允许您通过传入一些参数(例如1,-1,-2,2)并稍后从主列表传入元素来计算新值。

Tuples are created, e.g (primary element, computed element). The conditional part of the list comprehension, substitutes 0 for computed elements not found in primary list.

创建元组,例如(主要元素,计算元素)。列表推导的条件部分将0替换为主列表中未找到的计算元素。

See also this implementation of the curry function.

另见咖喱功能的这种实现。

#1


1  

In case x is some data, like words or random values and we need to recombine it we could use reindexing mechanism in numpy.

如果x是某些数据,如单词或随机值,我们需要重新组合它,我们可以在numpy中使用重建索引机制。

Substituted by zero version

由零版本替代

x = np.array([1,2,3,4])
wz = 2
zero = 0

Lets build indexing matrix.

让我们构建索引矩阵。

ri = np.arange(-wz,wz+1)+np.arange(x.shape[0]).reshape(-1,1)
print(ri) 

Output:

  [[-2, -1,  0,  1,  2],
   [-1,  0,  1,  2,  3],
   [ 0,  1,  2,  3,  4],
   [ 1,  2,  3,  4,  5]

Now if we add zero to x as last element we can replace wrong indexes with it index.

现在,如果我们将x作为最后一个元素添加零,我们可以用它索引替换错误的索引。

np.place(ri,(ri<0)|(ri>x.shape[0]),x.shape[0]) #replace wrong indexes
np.vstack((
    np.hstack((x,[zero]))[ri].reshape(1,-1),#extending x with zero and reindexing 
    np.tile(x,2*wz+1)) #repeating basic `x` to each window position
    )#.T #uncomment .T to make it vertical   

Output:

 ([[0, 0, 1, 2, 3, 0, 1, 2, 3, 4, 1, 2, 3, 4, 0, 2, 3, 4, 0, 0],
   [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]])

Skipped version

The same idea, but in a slightly different order: produce a complete indexing matrix [window_index,x_index] then exclude the wrong pairs, and finally re-indexing 'x'.

相同的想法,但顺序略有不同:产生一个完整的索引矩阵[window_index,x_index]然后排除错误的对,最后重新索引'x'。

x = np.array([1,2,3,4])
wz = 2
ri = np.vstack((
    (np.arange(-wz,wz+1)+np.arange(x.shape[0]).reshape(-1,1)).ravel(),#same index matrix flaten 
    np.tile(np.arange(x.shape[0]),2*wz+1) #repeating `x` indexes to each window position
    )) 
x[ri[:,(ri[0]>=0)&(ri[0]<x.shape[0])]]#.T #uncomment .T to make it vertical   

Output:

 [[1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4],
  [3, 4, 1, 3, 4, 1, 2, 3, 4, 1, 2, 4, 1, 2]]

Update 1 (error fix) exclude zero from window to avoid pair duplication.

更新1(错误修复)从窗口中排除零以避免对重复。

x = np.array([1,2,3,4])
wz = 2
ri = np.vstack(((
        np.hstack(( np.arange(-wz,0), #remove zero from window
                    np.arange(1,wz+1)))+
        np.arange(x.shape[0]).reshape(-1,1)).ravel(), #same index matrix flaten 
    np.tile(np.arange(x.shape[0]),2*wz) #repeating `x` indexes to each window position
    )) 
x[ri[:,(ri[0]>=0)&(ri[0]<x.shape[0])]]#.T #uncomment .T to make it vertical   

Output:

  [[2, 3, 1, 3, 4, 1, 2, 4, 2, 3],
   [3, 4, 2, 3, 4, 1, 2, 3, 1, 2]]

Check documentation on used functions np.arange, np.reshape, np.place, np.hstack, broadcasting rules and indexing.

检查有关已使用函数np.arange,np.reshape,np.place,np.hstack,广播规则和索引的文档。

#2


2  

Here is a Numpythonic approach:

这是一个Numpythonic方法:

In [23]: a = np.array([1,2,3,4])
In [24]: arr = np.hstack((a-1, a+1, a - 2, a+ 2))
In [25]: mask = ~np.in1d(arr, a)
In [26]: arr[mask] = 0
In [27]: np.column_stack((np.tile(a, 4), arr))
Out[27]: 
array([ [1, 0],
        [2, 1],
        [3, 2],
        [4, 3],
        [1, 2],
        [2, 3],
        [3, 4],
        [4, 0],
        [1, 0],
        [2, 0],
        [3, 1],
        [4, 2],
        [1, 3],
        [2, 4],
        [3, 0],
        [4, 0]])

#3


0  

The numpy approach is favorable, but here is a functional approach for those interested:

numpy方法是有利的,但对于那些感兴趣的人来说,这是一种功能性的方法:

Given

import functools as ft


# Helper function
def curry(f):
    @ft.wraps(f)
    def wrapped(arg):
        try:
            return f(arg)
        except TypeError:
            return curry(ft.wraps(f)(ft.partial(f, arg)))
    return wrapped

Code

lst = [1, 2, 3, 4]
c = curry(lambda x, y: x + y)
funcs = [c(-1), c(1), c(-2), c(2)]
set_ = set(lst)


[[x, 0] if fn(x) not in set_ else [x, fn(x)] for fn in funcs for x in lst]

Output

[[1, 0],
 [2, 1],
 [3, 2],
 [4, 3],
 [1, 2],
 [2, 3],
 [3, 4],
 [4, 0],
 [1, 0],
 [2, 0],
 [3, 1],
 [4, 2],
 [1, 3],
 [2, 4],
 [3, 0],
 [4, 0]]

Details

In the double for loops of the list comprehension, a list of curried functions is iterated, and each function is applied to every element of the primary list (lst). Currying allows you to compute new values by passing in some argument (e.g. 1, -1, -2, 2) and later passing in an element from the primary list.

在列表推导的双for循环中,迭代curried函数列表,并且每个函数应用于主列表(lst)的每个元素。 Currying允许您通过传入一些参数(例如1,-1,-2,2)并稍后从主列表传入元素来计算新值。

Tuples are created, e.g (primary element, computed element). The conditional part of the list comprehension, substitutes 0 for computed elements not found in primary list.

创建元组,例如(主要元素,计算元素)。列表推导的条件部分将0替换为主列表中未找到的计算元素。

See also this implementation of the curry function.

另见咖喱功能的这种实现。