如何比较两个numpy数组并对另一个添加缺失的值

时间:2022-03-04 19:13:48

I have two numpy arrays of different dimension. I want to add those additional elements of the bigger array to the smaller array, only the 0th element and the 1st element should be given as 0.

我有两个不同维度的numpy数组。我想把大数组中的其他元素添加到小数组中,只有第0个元素和第1个元素应该被赋为0。

For example :

例如:

a = [ [2,4],[4,5], [8,9],[7,5]]

a = [2,4],[4,5], [8,9],[7,5]

b = [ [2,5], [4,6]]

b = [[2,5], [4,6]]

After adding the missing elements to b, b would become as follows :

将缺失的元素添加到b后,b变为:

b [ [2,5], [4,6], [8,0], [7,0] ]

b [2,5], [4,6], [8,0], [7,0]

I have tried the logic up to some extent, however some values are getting redundantly added as I am not able to check whether that element has already been added to b or not.

我已经在某种程度上尝试了逻辑,但是有些值被多余的添加了,因为我不能检查这个元素是否已经被添加到b中。

Secondly, I am doing it with the help of an additional array c which is the copy of b and then doing the desired operations to c. If somebody can show me how to do it without the third array c , would be very helpful.

第二,我是在一个额外的数组c的帮助下做的,它是b的拷贝,然后对c做想要的操作。

import numpy as np

a = [[2,3],[4,5],[6,8], [9,6]]

b = [[2,3],[4,5]]

a = np.array(a)
b = np.array(b)
c = np.array(b)

for i in range(len(b)):
    for j in range(len(a)):
        if a[j,0] == b[i,0]:
            print "matched "
        else:
            print "not matched"
            c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
print c

4 个解决方案

#1


2  

#####For explanation#####
#basic set operation to get the missing elements 
c = set([i[0] for i in a]) - set([i[0] for i in b])
#c will just store the missing elements....
#then just append the elements 
for i in c:
    b.append([i, 0])

Output -

输出-

[[2, 5], [4, 6], [8, 0], [7, 0]]

Edit -

编辑-

But as they are numpy arrays you can just do this (and without using c as an intermediate) - just two lines

但是由于它们是numpy数组,所以您可以这样做(不使用c作为中间元素)——只需两行

for i in set(a[:, 0]) - (set(b[:, 0])):
    b = np.append(b, [[i, 0]], axis = 0)

Output -

输出-

array([[2, 5],
       [4, 6],
       [8, 0],
       [7, 0]])

#2


1  

You can use np.in1d to look for matching rows from b in a to get a mask and based on the mask choose rows from a or set to zeros. Thus, we would have a vectorized approach as shown below -

你可以用np。in1d查找来自a中的b的匹配行以获得一个掩码,并基于掩码从a中选择行或设置为0。因此,我们将有一个矢量化的方法如下所示

np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))

Sample run -

样本运行-

In [47]: a
Out[47]: 
array([[2, 4],
       [4, 5],
       [8, 9],
       [7, 5]])

In [48]: b
Out[48]: 
array([[8, 7],
       [4, 6]])

In [49]: np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Out[49]: 
array([[8, 7],
       [4, 6],
       [2, 0],
       [7, 0]])

#3


1  

First we should clear up one misconception. c does not have to be a copy. A new variable assignment is sufficient.

首先,我们应该澄清一个误解。c不一定是拷贝。一个新的变量赋值是充分的。

c = b
...
    c= np.insert(c, len(c), [a[j,0], 0], axis = 0)

np.insert is not modifying any of its inputs. Rather it makes a new array. And the c=... just assigns that to c, replacing the original assignment. So the original c assignment just makes writing the iteration easier.

np。insert不修改它的任何输入。相反,它会生成一个新的数组。和c =…把它赋给c,替换原来的赋值。原来的c赋值只会让迭代更容易写。

Since you are adding this new [a[j,0],0] at the end, you could use concatenate (the underlying function used by insert and stack(s).

由于在末尾添加了新的[a[j,0],所以可以使用concatenate (insert和stack(s)使用的底层函数)。

c = np.concatenate((c, [a[j,0],0]), axis=0)

That won't make much of a change in the run time. It's better to find all the a[j] and add them all at once.

这不会在运行时产生很大的变化。最好找到所有的a[j],同时把它们都加起来。

In this case you want to add a[2,0] and a[3,0]. Leaving aside, for the moment, the question of how we find [2,3], we can do:

在本例中,您需要添加[2,0]和[3,0]。暂且不谈我们如何找到[2,3]的问题,我们可以这样做:

In [595]: a=np.array([[2,3],[4,5],[6,8],[9,6]])
In [596]: b=np.array([[2,3],[4,5]])
In [597]: ind = [2,3]

An assign and fill approach would look like:

一种分配和填补办法将是:

In [605]: c = np.zeros_like(a)   # target array  
In [607]: c[0:b.shape[0],:] = b     # fill in the b values
In [608]: c[b.shape[0]:,0] = a[ind,0]    # fill in the selected a column

In [609]: c
Out[609]: 
array([[2, 3],
       [4, 5],
       [6, 0],
       [9, 0]])

A variation would be construct a temporary array with the new a values, and concatenate

一个变体是用新的A值构造一个临时数组,并将其连接起来

In [613]: a1 = np.zeros((len(ind),2),a.dtype) 
In [614]: a1[:,0] = a[ind,0]   
In [616]: np.concatenate((b,a1),axis=0)
Out[616]: 
array([[2, 3],
       [4, 5],
       [6, 0],
       [9, 0]])

I'm using the a1 create and fill approach because I'm too lazy to figure out how to concatenate a[ind,0] with enough 0s to make the same thing. :)

我使用a1创建和填充方法,因为我太懒了,不知道如何用足够的0连接一个[ind,0]来做同样的事情。:)

As Divakar shows, np.in1d is a handy way of finding the matches

Divakar显示,np。in1d是查找匹配项的简便方法

In [617]: np.in1d(a[:,0],b[:,0])
Out[617]: array([ True,  True, False, False], dtype=bool)

In [618]: np.nonzero(~np.in1d(a[:,0],b[:,0]))
Out[618]: (array([2, 3], dtype=int32),)

In [619]: np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
Out[619]: array([2, 3], dtype=int32)

In [620]: ind=np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]

If you don't care about the order a[ind,0] can also be gotten with np.setdiff1d(a[:,0],b[:,0]) (the values will be sorted).

如果你不关心a[ind,0]的顺序,也可以通过np.setdiff1d(a[:,0],b[:,0])得到(值将被排序)。

#4


0  

Assuming you are working on a single dimensional array:

假设您正在处理一个一维数组:

import numpy as np
a = np.linspace(1, 90, 90)
b = np.array([1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,
             21,22,23,24,25,27,28,31,32,33,34,35,36,37,38,39,
             40,41,42,43,44,46,47,48,49,50,51,52,53,54,55,56,
             57,58,59,60,61,62,63,64,65,67,70,72,73,74,75,76,
             77,78,79,80,81,82,84,85,86,87,88,89,90])

m_num = np.setxor1d(a, b).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num), m_num))

This also works in a 2D space:

这也适用于二维空间:

t1 = np.reshape(a, (10, 9))
t2 = np.reshape(b, (10, 8))
m_num2 = np.setxor1d(t1, t2).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num2), m_num2))

#1


2  

#####For explanation#####
#basic set operation to get the missing elements 
c = set([i[0] for i in a]) - set([i[0] for i in b])
#c will just store the missing elements....
#then just append the elements 
for i in c:
    b.append([i, 0])

Output -

输出-

[[2, 5], [4, 6], [8, 0], [7, 0]]

Edit -

编辑-

But as they are numpy arrays you can just do this (and without using c as an intermediate) - just two lines

但是由于它们是numpy数组,所以您可以这样做(不使用c作为中间元素)——只需两行

for i in set(a[:, 0]) - (set(b[:, 0])):
    b = np.append(b, [[i, 0]], axis = 0)

Output -

输出-

array([[2, 5],
       [4, 6],
       [8, 0],
       [7, 0]])

#2


1  

You can use np.in1d to look for matching rows from b in a to get a mask and based on the mask choose rows from a or set to zeros. Thus, we would have a vectorized approach as shown below -

你可以用np。in1d查找来自a中的b的匹配行以获得一个掩码,并基于掩码从a中选择行或设置为0。因此,我们将有一个矢量化的方法如下所示

np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))

Sample run -

样本运行-

In [47]: a
Out[47]: 
array([[2, 4],
       [4, 5],
       [8, 9],
       [7, 5]])

In [48]: b
Out[48]: 
array([[8, 7],
       [4, 6]])

In [49]: np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Out[49]: 
array([[8, 7],
       [4, 6],
       [2, 0],
       [7, 0]])

#3


1  

First we should clear up one misconception. c does not have to be a copy. A new variable assignment is sufficient.

首先,我们应该澄清一个误解。c不一定是拷贝。一个新的变量赋值是充分的。

c = b
...
    c= np.insert(c, len(c), [a[j,0], 0], axis = 0)

np.insert is not modifying any of its inputs. Rather it makes a new array. And the c=... just assigns that to c, replacing the original assignment. So the original c assignment just makes writing the iteration easier.

np。insert不修改它的任何输入。相反,它会生成一个新的数组。和c =…把它赋给c,替换原来的赋值。原来的c赋值只会让迭代更容易写。

Since you are adding this new [a[j,0],0] at the end, you could use concatenate (the underlying function used by insert and stack(s).

由于在末尾添加了新的[a[j,0],所以可以使用concatenate (insert和stack(s)使用的底层函数)。

c = np.concatenate((c, [a[j,0],0]), axis=0)

That won't make much of a change in the run time. It's better to find all the a[j] and add them all at once.

这不会在运行时产生很大的变化。最好找到所有的a[j],同时把它们都加起来。

In this case you want to add a[2,0] and a[3,0]. Leaving aside, for the moment, the question of how we find [2,3], we can do:

在本例中,您需要添加[2,0]和[3,0]。暂且不谈我们如何找到[2,3]的问题,我们可以这样做:

In [595]: a=np.array([[2,3],[4,5],[6,8],[9,6]])
In [596]: b=np.array([[2,3],[4,5]])
In [597]: ind = [2,3]

An assign and fill approach would look like:

一种分配和填补办法将是:

In [605]: c = np.zeros_like(a)   # target array  
In [607]: c[0:b.shape[0],:] = b     # fill in the b values
In [608]: c[b.shape[0]:,0] = a[ind,0]    # fill in the selected a column

In [609]: c
Out[609]: 
array([[2, 3],
       [4, 5],
       [6, 0],
       [9, 0]])

A variation would be construct a temporary array with the new a values, and concatenate

一个变体是用新的A值构造一个临时数组,并将其连接起来

In [613]: a1 = np.zeros((len(ind),2),a.dtype) 
In [614]: a1[:,0] = a[ind,0]   
In [616]: np.concatenate((b,a1),axis=0)
Out[616]: 
array([[2, 3],
       [4, 5],
       [6, 0],
       [9, 0]])

I'm using the a1 create and fill approach because I'm too lazy to figure out how to concatenate a[ind,0] with enough 0s to make the same thing. :)

我使用a1创建和填充方法,因为我太懒了,不知道如何用足够的0连接一个[ind,0]来做同样的事情。:)

As Divakar shows, np.in1d is a handy way of finding the matches

Divakar显示,np。in1d是查找匹配项的简便方法

In [617]: np.in1d(a[:,0],b[:,0])
Out[617]: array([ True,  True, False, False], dtype=bool)

In [618]: np.nonzero(~np.in1d(a[:,0],b[:,0]))
Out[618]: (array([2, 3], dtype=int32),)

In [619]: np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
Out[619]: array([2, 3], dtype=int32)

In [620]: ind=np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]

If you don't care about the order a[ind,0] can also be gotten with np.setdiff1d(a[:,0],b[:,0]) (the values will be sorted).

如果你不关心a[ind,0]的顺序,也可以通过np.setdiff1d(a[:,0],b[:,0])得到(值将被排序)。

#4


0  

Assuming you are working on a single dimensional array:

假设您正在处理一个一维数组:

import numpy as np
a = np.linspace(1, 90, 90)
b = np.array([1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,
             21,22,23,24,25,27,28,31,32,33,34,35,36,37,38,39,
             40,41,42,43,44,46,47,48,49,50,51,52,53,54,55,56,
             57,58,59,60,61,62,63,64,65,67,70,72,73,74,75,76,
             77,78,79,80,81,82,84,85,86,87,88,89,90])

m_num = np.setxor1d(a, b).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num), m_num))

This also works in a 2D space:

这也适用于二维空间:

t1 = np.reshape(a, (10, 9))
t2 = np.reshape(b, (10, 8))
m_num2 = np.setxor1d(t1, t2).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num2), m_num2))