I have a Python numpy array like this Let's call it my_numpy_array And can go up to a million values!
我有一个像这样的Python numpy数组让我们称它为my_numpy_array并且可以达到一百万个值!
>>> my_numpy_array
array([[1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
. . . . . . . . . . . .
. . . . . . . . . . . .
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1]])
and another numpy array like this call it second_array ,(which is not so huge)
和另一个像这样的numpy数组称之为second_array,(这不是那么大)
array([[1, 1, 1, 0, 0, 0, 0, 1], #row 1
[1, 1, 0, 0, 0, 0, 1, 0], #row 2
[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
....................... #row 9
[1, 1, 1, 0, 0, 0, 0, 1]]) #can be any number of ROWS!!!
I want to XOR these 9 (this is X..i.e can be any number) rows with every 9 rows in my_numpy_array. I tried working around with np.logical_xor()
but could'nt do what I wanted!
我想在my_numpy_array中每隔9行对这些9(这是X..i.e可以是任意数字)行进行异或。我尝试使用np.logical_xor(),但不能做我想要的!
Also note if the number of rows in my_numpy_arr is not a multiple of 9 (i.e X) say the no of rows is 2701..
另请注意,如果my_numpy_arr中的行数不是9的倍数(即X),则表示行数为2701。
for the first 2700 no problem! but the last one will be XOR-ed with only the first one from the second_array
对于第一个2700没问题!但是最后一个将只与第二个数组中的第一个进行异或
if it was 2702 then only the first two rows from the second_array..
如果是2702那么只有second_array的前两行..
Any help much appreciated! Thanks
任何帮助非常感谢!谢谢
2 个解决方案
#1
0
If the XOR filter is just one row, you could simply use numpy broadcasting:
如果XOR过滤器只是一行,你可以简单地使用numpy广播:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1]])
filt = np.asarray([1, 1, 1, 0, 0, 0, 0, 1])
res = arr ^ filt
If not, it doesn't look so pretty:
如果没有,它看起来不那么漂亮:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1]])
filt = np.asarray([[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 1, 0, 0, 0]])
filt_rows = filt.shape[0]
arr_rows = arr.shape[0]
res = arr ^ np.tile(filt, (1 + arr_rows // filt_rows ,1))[:arr_rows,:]
The filter rows are tiled to a larger array than your my_numpy_array
and then cut back by indexing, so both arrays have the same shape. Not sure, how this works with larger sizes, since it makes a copy of the array and doesn't work in place.
过滤器行平铺到比my_numpy_array更大的数组,然后通过索引切回,因此两个数组具有相同的形状。不确定,它如何适用于更大的尺寸,因为它制作了数组的副本并且无法正常工作。
#2
0
Method 1: repeat
方法1:重复
x = np.ones((271, 10))
y = np.zeros((9, 10))
np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
Method 1 is to repeat
y
enough times and subset the part that fits x
's row part.
方法1是重复y次并对符合x的行部分的部分进行子集化。
Method 2: reshape
方法2:重塑
def method2(x, y):
ry, ly = y.shape
rx, lx = x.shape
arr1 = np.logical_xor(x[:rx//ry*ry].reshape((ry, ly, rx // ry)),
y.reshape((ry, ly, 1)))
arr2 = np.logical_xor(x[rx//ry*ry:], y[:rx%ry, :]) # remainder part
return np.append(arr1.reshape((arr1.shape[0]*arr1.shape[2], arr1.shape[1])),
arr2, axis=0)
For method 2, we split the original x
into two parts: the part is a multiple of y
's row part and the remainder part. Take OP's problem for example, we split 2702 rows into 2700 rows and 2 rows because 2700 is a multiple of 9 and 2 is the remainder part. (The purpose of the messay part inside square brackets, like [:rx//ry*ry]
, is to do the split.)
对于方法2,我们将原始x分成两部分:该部分是y的行部分的倍数,其余部分。以OP的问题为例,我们将2702行分为2700行和2行,因为2700是9的倍数,2是剩余部分。 (方括号内的messay部分的目的,如[:rx // ry * ry]),就是进行拆分。)
For the 2700-row part, we can reshape it as a 3 dimensional tensor with shape (9, X, 30). Then, we reshape y
as (9, X, 1). In this case, while performing the operation np.logical_xor
, y
will be broadcasted as the same size of (9, X, 30)
. See broadcasting
for more information.
对于2700行部分,我们可以将其重塑为具有形状(9,X,30)的三维张量。然后,我们将y重塑为(9,X,1)。在这种情况下,在执行操作np.logical_xor时,y将被广播为(9,X,30)的相同大小。有关更多信息,请参阅广播
We also perform xor for the 2-row part and then use np.append
to glue these two results.
我们还对2行部分执行xor,然后使用np.append来粘合这两个结果。
Timing: For Large x, method 2 is faster
时间:对于大x,方法2更快
x = np.ones((2720000, 10))
y = np.zeros((9, 10))
%timeit method2(x, y)
52.5 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
175 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
repeat
creates a new array that is the same size of x
while the second broadcasting/reshape
method do not do so. Thus, repeat
could cost more time when x
is big.
repeat创建一个与x大小相同的新数组,而第二个broadcast / reshape方法不这样做。因此,当x很大时,重复可能会花费更多时间。
#1
0
If the XOR filter is just one row, you could simply use numpy broadcasting:
如果XOR过滤器只是一行,你可以简单地使用numpy广播:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1]])
filt = np.asarray([1, 1, 1, 0, 0, 0, 0, 1])
res = arr ^ filt
If not, it doesn't look so pretty:
如果没有,它看起来不那么漂亮:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1]])
filt = np.asarray([[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 1, 0, 0, 0]])
filt_rows = filt.shape[0]
arr_rows = arr.shape[0]
res = arr ^ np.tile(filt, (1 + arr_rows // filt_rows ,1))[:arr_rows,:]
The filter rows are tiled to a larger array than your my_numpy_array
and then cut back by indexing, so both arrays have the same shape. Not sure, how this works with larger sizes, since it makes a copy of the array and doesn't work in place.
过滤器行平铺到比my_numpy_array更大的数组,然后通过索引切回,因此两个数组具有相同的形状。不确定,它如何适用于更大的尺寸,因为它制作了数组的副本并且无法正常工作。
#2
0
Method 1: repeat
方法1:重复
x = np.ones((271, 10))
y = np.zeros((9, 10))
np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
Method 1 is to repeat
y
enough times and subset the part that fits x
's row part.
方法1是重复y次并对符合x的行部分的部分进行子集化。
Method 2: reshape
方法2:重塑
def method2(x, y):
ry, ly = y.shape
rx, lx = x.shape
arr1 = np.logical_xor(x[:rx//ry*ry].reshape((ry, ly, rx // ry)),
y.reshape((ry, ly, 1)))
arr2 = np.logical_xor(x[rx//ry*ry:], y[:rx%ry, :]) # remainder part
return np.append(arr1.reshape((arr1.shape[0]*arr1.shape[2], arr1.shape[1])),
arr2, axis=0)
For method 2, we split the original x
into two parts: the part is a multiple of y
's row part and the remainder part. Take OP's problem for example, we split 2702 rows into 2700 rows and 2 rows because 2700 is a multiple of 9 and 2 is the remainder part. (The purpose of the messay part inside square brackets, like [:rx//ry*ry]
, is to do the split.)
对于方法2,我们将原始x分成两部分:该部分是y的行部分的倍数,其余部分。以OP的问题为例,我们将2702行分为2700行和2行,因为2700是9的倍数,2是剩余部分。 (方括号内的messay部分的目的,如[:rx // ry * ry]),就是进行拆分。)
For the 2700-row part, we can reshape it as a 3 dimensional tensor with shape (9, X, 30). Then, we reshape y
as (9, X, 1). In this case, while performing the operation np.logical_xor
, y
will be broadcasted as the same size of (9, X, 30)
. See broadcasting
for more information.
对于2700行部分,我们可以将其重塑为具有形状(9,X,30)的三维张量。然后,我们将y重塑为(9,X,1)。在这种情况下,在执行操作np.logical_xor时,y将被广播为(9,X,30)的相同大小。有关更多信息,请参阅广播
We also perform xor for the 2-row part and then use np.append
to glue these two results.
我们还对2行部分执行xor,然后使用np.append来粘合这两个结果。
Timing: For Large x, method 2 is faster
时间:对于大x,方法2更快
x = np.ones((2720000, 10))
y = np.zeros((9, 10))
%timeit method2(x, y)
52.5 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
175 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
repeat
creates a new array that is the same size of x
while the second broadcasting/reshape
method do not do so. Thus, repeat
could cost more time when x
is big.
repeat创建一个与x大小相同的新数组,而第二个broadcast / reshape方法不这样做。因此,当x很大时,重复可能会花费更多时间。