根据最大值过滤numpy数组

I have a numpy array which holds 4-dimensional vectors which have the following format (x, y, z, w)

我有一个numpy数组，它包含4维向量，其格式如下（x，y，z，w）

The size of the array is 4 x N. Now, the data I have is where I have (x, y, z) spatial locations and w holds some particular measurement at this location. Now, there could be multiple measurements associated with an (x, y, z) position (measured as floats).

数组的大小是4 x N.现在，我拥有的数据是我有（x，y，z）空间位置的地方，w在这个位置保持一些特定的测量。现在，可能存在与（x，y，z）位置相关联的多个测量值（以浮点数测量）。

What I would like to do is filter the array, so that I get a new array where I get the maximum measurement corresponding with each (x, y, z) position.

我想做的是过滤数组，这样我得到一个新的数组，我得到与每个（x，y，z）位置对应的最大测量值。

So if my data is like:

所以，如果我的数据如下：

x, y, z, w1
x, y, z, w2
x, y, z, w3

where w1 is greater than w2 and w3, the filtered data would be:

其中w1大于w2和w3，过滤的数据将是：

x, y, z, w1

So more concretely, say I have data like:

更具体地说，我说有以下数据：

[[ 0.7732126   0.48649481  0.29771819  0.91622924]
 [ 0.7732126   0.48649481  0.29771819  1.91622924]
 [ 0.58294263  0.32025559  0.6925856   0.0524125 ]
 [ 0.58294263  0.32025559  0.6925856   0.05 ]
 [ 0.58294263  0.32025559  0.6925856   1.7 ]
 [ 0.3239913   0.7786444   0.41692853  0.10467392]
 [ 0.12080023  0.74853649  0.15356663  0.4505753 ]
 [ 0.13536096  0.60319054  0.82018125  0.10445047]
 [ 0.1877724   0.96060999  0.39697999  0.59078612]]

This should return

这应该回来了

[[ 0.7732126   0.48649481  0.29771819  1.91622924]
 [ 0.58294263  0.32025559  0.6925856   1.7 ]
 [ 0.3239913   0.7786444   0.41692853  0.10467392]
 [ 0.12080023  0.74853649  0.15356663  0.4505753 ]
 [ 0.13536096  0.60319054  0.82018125  0.10445047]
 [ 0.1877724   0.96060999  0.39697999  0.59078612]]

5 个解决方案

#1

This is convoluted, but it is probably as good as you are going to get using numpy only...

这是令人费解的，但它可能就像你将只使用numpy一样好......

First, we use lexsort to put all entries with the same coordinates together. With a being your sample array:

首先，我们使用lexsort将所有具有相同坐标的条目放在一起。有了样本数组：

>>> perm = np.lexsort(a[:, 3::-1].T)
>>> a[perm]
array([[ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.7732126 ,  0.48649481,  0.29771819,  0.91622924],
       [ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.0524125 ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.05      ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047]])

Note that by reversing the axis, we are sorting by x, breaking ties with y, then z, then w.

请注意，通过反转轴，我们按x排序，断开与y的关系，然后是z，然后是w。

Because it is the maximum we are looking for, we just need to take the last entry in every group, which is a pretty straightforward thing to do:

因为它是我们正在寻找的最大值，我们只需要在每个组中取最后一个条目，这是一个非常简单的事情：

>>> a_sorted = a[perm]
>>> last = np.concatenate((np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1),
                           [True]))
>>> a_unique_max = a_sorted[last]
>>> a_unique_max
array([[ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924]])

If you would rather not have the output sorted, but keep them in the original order they came up in the original array, you can also get that with the aid of perm:

如果您不想对输出进行排序，但保留原始顺序中的输出，则可以在perm的帮助下获得：

>>> a_unique_max[np.argsort(perm[last])]
array([[ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612]])

This will only work for the maximum, and it comes as a by-product of the sorting. If you are after a different function, say the product of all same-coordinates entries, you could do something like:

这只会最大限度地发挥作用，并且它是排序的副产品。如果您使用的是另一个函数，比如所有相同坐标条目的乘积，您可以执行以下操作：

>>> first = np.concatenate(([True],
                            np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1)))
>>> a_unique_prods = np.multiply.reduceat(a_sorted, np.nonzero(first)[0])

And you will have to play a little around with these results to assemble your return array.

而且你将不得不玩这些结果来组装你的返回数组。

#2

I see that you already got the pointer towards pandas in the comments. FWIW, here's how you can get the desired behavior, assuming you don't care about the final sort order since groupby changes it up.

我看到你已经在评论中得到了指向熊猫的指针。 FWIW，这里是你如何获得所需的行为，假设你不关心最终的排序顺序，因为groupby改变了它。

In [14]: arr
Out[14]:
array([[ 0.7732126 ,  0.48649481,  0.29771819,  0.91622924],
       [ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.0524125 ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.05      ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612]])

In [15]: import pandas as pd

In [16]: pd.DataFrame(arr)
Out[16]:
          0         1         2         3
0  0.773213  0.486495  0.297718  0.916229
1  0.773213  0.486495  0.297718  1.916229
2  0.582943  0.320256  0.692586  0.052413
3  0.582943  0.320256  0.692586  0.050000
4  0.582943  0.320256  0.692586  1.700000
5  0.323991  0.778644  0.416929  0.104674
6  0.120800  0.748536  0.153567  0.450575
7  0.135361  0.603191  0.820181  0.104450
8  0.187772  0.960610  0.396980  0.590786

In [17]: pd.DataFrame(arr).groupby([0,1,2]).max().reset_index()
Out[17]:
          0         1         2         3
0  0.120800  0.748536  0.153567  0.450575
1  0.135361  0.603191  0.820181  0.104450
2  0.187772  0.960610  0.396980  0.590786
3  0.323991  0.778644  0.416929  0.104674
4  0.582943  0.320256  0.692586  1.700000
5  0.773213  0.486495  0.297718  1.916229

#3

You can start off with lex-sorting input array to bring entries with identical first three elements in succession. Then, create another 2D array to store the last column entries, such that elements corresponding to each duplicate triplet goes into the same rows. Next, find the max along axis=1 for this 2D array and thus have the final max output for each such unique triplet. Here's the implementation, assuming A as the input array -

您可以从lex-sorting输入数组开始，连续输入相同的前三个元素。然后，创建另一个2D数组以存储最后一列的条目，使得与每个重复三元组相对应的元素进入相同的行。接下来，找到此2D阵列的最大轴= 1，因此每个这样的唯一三元组具有最终的最大输出。这是实现，假设A作为输入数组 -

# Lex sort A
sortedA = A[np.lexsort(A[:,:-1].T)]

# Mask of start of unique first three columns from A
start_unqA = np.append(True,~np.all(np.diff(sortedA[:,:-1],axis=0)==0,axis=1))

# Counts of unique first three columns from A
counts = np.bincount(start_unqA.cumsum()-1)
mask = np.arange(counts.max()) < counts[:,None]

# Group A's last column into rows based on uniqueness from first three columns
grpA = np.empty(mask.shape)
grpA.fill(np.nan)
grpA[mask] = sortedA[:,-1]

# Concatenate unique first three columns from A and 
# corresponding max values for each such unique triplet
out = np.column_stack((sortedA[start_unqA,:-1],np.nanmax(grpA,axis=1)))

Sample run -

样品运行 -

In [75]: A
Out[75]: 
array([[ 1,  1,  1, 96],
       [ 1,  2,  2, 48],
       [ 2,  1,  2, 33],
       [ 1,  1,  1, 24],
       [ 1,  1,  1, 94],
       [ 2,  2,  2,  5],
       [ 2,  1,  1, 17],
       [ 2,  2,  2, 62]])

In [76]: sortedA
Out[76]: 
array([[ 1,  1,  1, 96],
       [ 1,  1,  1, 24],
       [ 1,  1,  1, 94],
       [ 2,  1,  1, 17],
       [ 2,  1,  2, 33],
       [ 1,  2,  2, 48],
       [ 2,  2,  2,  5],
       [ 2,  2,  2, 62]])

In [77]: out
Out[77]: 
array([[  1.,   1.,   1.,  96.],
       [  2.,   1.,   1.,  17.],
       [  2.,   1.,   2.,  33.],
       [  1.,   2.,   2.,  48.],
       [  2.,   2.,   2.,  62.]])

#4

-1

You can use logical indexing.

您可以使用逻辑索引。

I will use random data for an example:

我将使用随机数据作为示例：

>>> myarr = np.random.random((6, 4))
>>> print(myarr)
[[ 0.7732126   0.48649481  0.29771819  0.91622924]
 [ 0.58294263  0.32025559  0.6925856   0.0524125 ]
 [ 0.3239913   0.7786444   0.41692853  0.10467392]
 [ 0.12080023  0.74853649  0.15356663  0.4505753 ]
 [ 0.13536096  0.60319054  0.82018125  0.10445047]
 [ 0.1877724   0.96060999  0.39697999  0.59078612]]

To get the row or rows where the last column is the greatest, do this:

要获取最后一列最大的行，请执行以下操作：

>>> greatest = myarr[myarr[:, 3]==myarr[:, 3].max()]
>>> print(greatest)
[[ 0.7732126   0.48649481  0.29771819  0.91622924]]

What this does is it gets the last column of myarr, and finds the maximum of that column, finds all the elements of that column equal to the maximum, and then gets the corresponding rows.

这样做是它获取myarr的最后一列，并找到该列的最大值，查找该列的所有元素等于最大值，然后获取相应的行。

#5

-1

You can use np.argmax

你可以使用np.argmax

x[np.argmax(x[:,3]),:]

X [np.argmax（X [：，3]）,:]

>>> x = np.random.random((5,4))
>>> x
array([[ 0.25461146,  0.35671081,  0.54856798,  0.2027313 ],
       [ 0.17079029,  0.66970362,  0.06533572,  0.31704254],
       [ 0.4577928 ,  0.69022073,  0.57128696,  0.93995176],
       [ 0.29708841,  0.96324181,  0.78859008,  0.25433235],
       [ 0.58739451,  0.17961551,  0.67993786,  0.73725493]])
>>> x[np.argmax(x[:,3]),:]
array([ 0.4577928 ,  0.69022073,  0.57128696,  0.93995176])

#1