如何对每个X连续行numpy Python执行XOR操作

I have a Python numpy array like this Let's call it my_numpy_array And can go up to a million values!

我有一个像这样的Python numpy数组让我们称它为my_numpy_array并且可以达到一百万个值!

>>> my_numpy_array
array([[1, 0, 0, 0, 0, 0, 1, 0],
       [1, 1, 0, 0, 0, 0, 1, 1],
       [1, 1, 0, 0, 0, 0, 1, 0],
       . . . . . . . . . . . . 
       . . . . . . . . . . . . 
       [0, 0, 0, 0, 0, 0, 1, 1],
       [1, 1, 1, 0, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1],
       [0, 0, 1, 0, 0, 0, 1, 1]])

and another numpy array like this call it second_array ,(which is not so huge)

和另一个像这样的numpy数组称之为second_array,(这不是那么大)

array([[1, 1, 1, 0, 0, 0, 0, 1],      #row 1
       [1, 1, 0, 0, 0, 0, 1, 0],      #row 2
       [1, 1, 1, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 1, 1],
       [1, 1, 1, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 1, 1],
       [1, 1, 1, 0, 0, 0, 0, 1],
       [1, 1, 0, 0, 0, 0, 1, 0],
       .......................        #row 9
       [1, 1, 1, 0, 0, 0, 0, 1]])     #can be any number of ROWS!!!

I want to XOR these 9 (this is X..i.e can be any number) rows with every 9 rows in my_numpy_array. I tried working around with np.logical_xor() but could'nt do what I wanted!

我想在my_numpy_array中每隔9行对这些9(这是X..i.e可以是任意数字)行进行异或。我尝试使用np.logical_xor(),但不能做我想要的!

Also note if the number of rows in my_numpy_arr is not a multiple of 9 (i.e X) say the no of rows is 2701..

另请注意,如果my_numpy_arr中的行数不是9的倍数(即X),则表示行数为2701。

for the first 2700 no problem! but the last one will be XOR-ed with only the first one from the second_array

对于第一个2700没问题!但是最后一个将只与第二个数组中的第一个进行异或

if it was 2702 then only the first two rows from the second_array..

如果是2702那么只有second_array的前两行..

Any help much appreciated! Thanks

任何帮助非常感谢!谢谢

2 个解决方案

#1

If the XOR filter is just one row, you could simply use numpy broadcasting:

如果XOR过滤器只是一行,你可以简单地使用numpy广播:

arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
                   [1, 1, 0, 0, 0, 0, 1, 1],
                   [1, 1, 0, 0, 0, 0, 1, 0],
                   [0, 0, 0, 0, 0, 0, 1, 1],
                   [1, 1, 1, 0, 0, 0, 1, 1],
                   [0, 0, 0, 0, 0, 0, 1, 1],
                   [0, 0, 1, 0, 0, 0, 1, 1]])

filt = np.asarray([1, 1, 1, 0, 0, 0, 0, 1])

res = arr ^ filt

If not, it doesn't look so pretty:

如果没有,它看起来不那么漂亮:

arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
                   [1, 1, 0, 0, 0, 0, 1, 1],
                   [1, 1, 0, 0, 0, 0, 1, 0],
                   [1, 1, 1, 0, 0, 0, 1, 1],
                   [0, 0, 0, 0, 0, 0, 1, 1],
                   [1, 1, 1, 0, 0, 0, 1, 1],
                   [0, 0, 0, 0, 0, 0, 1, 1]])

filt = np.asarray([[1, 1, 1, 0, 0, 0, 0, 1],
                  [0, 0, 1, 0, 1, 0, 0, 0]])


filt_rows = filt.shape[0]
arr_rows = arr.shape[0]

res = arr ^ np.tile(filt, (1 + arr_rows // filt_rows ,1))[:arr_rows,:]

The filter rows are tiled to a larger array than your my_numpy_array and then cut back by indexing, so both arrays have the same shape. Not sure, how this works with larger sizes, since it makes a copy of the array and doesn't work in place.

过滤器行平铺到比my_numpy_array更大的数组,然后通过索引切回,因此两个数组具有相同的形状。不确定,它如何适用于更大的尺寸,因为它制作了数组的副本并且无法正常工作。

#2

Method 1: repeat

方法1:重复

x = np.ones((271, 10))
y = np.zeros((9, 10))
np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])

Method 1 is to repeat y enough times and subset the part that fits x's row part.

方法1是重复y次并对符合x的行部分的部分进行子集化。

Method 2: reshape

方法2:重塑

def method2(x, y):
    ry, ly = y.shape
    rx, lx = x.shape
    arr1 = np.logical_xor(x[:rx//ry*ry].reshape((ry, ly, rx // ry)), 
                          y.reshape((ry, ly, 1)))
    arr2 = np.logical_xor(x[rx//ry*ry:], y[:rx%ry, :]) # remainder part
    return np.append(arr1.reshape((arr1.shape[0]*arr1.shape[2], arr1.shape[1])), 
                     arr2, axis=0)

For method 2, we split the original x into two parts: the part is a multiple of y's row part and the remainder part. Take OP's problem for example, we split 2702 rows into 2700 rows and 2 rows because 2700 is a multiple of 9 and 2 is the remainder part. (The purpose of the messay part inside square brackets, like [:rx//ry*ry], is to do the split.)

对于方法2,我们将原始x分成两部分:该部分是y的行部分的倍数,其余部分。以OP的问题为例,我们将2702行分为2700行和2行,因为2700是9的倍数,2是剩余部分。 (方括号内的messay部分的目的,如[:rx // ry * ry]),就是进行拆分。)

For the 2700-row part, we can reshape it as a 3 dimensional tensor with shape (9, X, 30). Then, we reshape y as (9, X, 1). In this case, while performing the operation np.logical_xor, y will be broadcasted as the same size of (9, X, 30). See broadcasting for more information.

对于2700行部分,我们可以将其重塑为具有形状(9,X,30)的三维张量。然后,我们将y重塑为(9,X,1)。在这种情况下,在执行操作np.logical_xor时,y将被广播为(9,X,30)的相同大小。有关更多信息,请参阅广播

We also perform xor for the 2-row part and then use np.append to glue these two results.

我们还对2行部分执行xor,然后使用np.append来粘合这两个结果。

Timing: For Large x, method 2 is faster

时间:对于大x,方法2更快

x = np.ones((2720000, 10))
y = np.zeros((9, 10))

%timeit method2(x, y)
52.5 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
175 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

repeat creates a new array that is the same size of x while the second broadcasting/reshape method do not do so. Thus, repeat could cost more time when x is big.

repeat创建一个与x大小相同的新数组,而第二个broadcast / reshape方法不这样做。因此,当x很大时,重复可能会花费更多时间。

#1