Python:基于数组中的值分割NumPy数组

时间:2021-07-29 12:18:19

I have one big array:

我有一个大数组:

[(1.0, 3.0, 1, 427338.4297000002, 4848489.4332)
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692)
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469) ...,
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592)
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351)
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)]

I want to split this array into multiple arrays based on the 2nd value in the array (3.0, 3.0, 3.0...1.0,1.0,10).

我想基于数组中的第二个值将这个数组分割成多个数组(3.0、3.0、3.0……1.0、1.0、10)。

Every time the 2nd value changes, I want a new array, so basically each new array has the same 2nd value. I've looked this up on Stack Overflow and know of the command

每次第二个值改变时,我想要一个新的数组,所以基本上每个新数组都有相同的第二个值。我在Stack Overflow上查询过这个命令,知道这个命令

np.split(array, number)

but I'm not trying to split the array into a certain number of arrays, but rather by a value. How would I be able to split the array in the way specified above? Any help would be appreciated!

但是我并不是要把数组分割成一定数量的数组,而是通过一个值。我怎样才能按照上面指定的方式分割数组?如有任何帮助,我们将不胜感激!

1 个解决方案

#1


12  

You can find the indices where the values differ by using numpy.where and numpy.diff on the first column:

您可以通过使用numpy找到值不同的索引。numpy和地点。第一栏的差异:

>>> arr = np.array([(1.0, 3.0, 1, 427338.4297000002, 4848489.4332),
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692),
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469),
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592),
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351),
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)])
>>> np.split(arr, np.where(np.diff(arr[:,1]))[0]+1)
[array([[  1.00000000e+00,   3.00000000e+00,   1.00000000e+00,
          4.27338430e+05,   4.84848943e+06],
       [  1.00000000e+00,   3.00000000e+00,   2.00000000e+00,
          4.27344794e+05,   4.84848207e+06],
       [  1.00000000e+00,   3.00000000e+00,   3.00000000e+00,
          4.27346430e+05,   4.84847275e+06]]),
 array([[  1.00000000e+00,   1.00000000e+00,   7.08400000e+03,
          4.27345271e+05,   4.84879659e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08500000e+03,
          4.27352928e+05,   4.84879094e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08600000e+03,
          4.27359161e+05,   4.84878743e+06]])]

Explanation:

解释:

Here first we are going to fetch the items in the second 2 column:

首先,我们要取回第二列的项目:

>>> arr[:,1]
array([ 3.,  3.,  3.,  1.,  1.,  1.])

Now to find out where the items actually change we can use numpy.diff:

现在,为了找出物品真正的变化,我们可以使用numpy.diff:

>>> np.diff(arr[:,1])
array([ 0.,  0., -2.,  0.,  0.])

Any thing non-zero means that the item next to it was different, we can use numpy.where to find the indices of non-zero items and then add 1 to it because the actual index of such item is one more than the returned index:

任何非零都意味着它旁边的项是不同的,我们可以用numpy。在哪里找到非零项的索引,然后加上1,因为该项的实际索引比返回的索引多1个:

>>> np.where(np.diff(arr[:,1]))[0]+1
array([3])

#1


12  

You can find the indices where the values differ by using numpy.where and numpy.diff on the first column:

您可以通过使用numpy找到值不同的索引。numpy和地点。第一栏的差异:

>>> arr = np.array([(1.0, 3.0, 1, 427338.4297000002, 4848489.4332),
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692),
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469),
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592),
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351),
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)])
>>> np.split(arr, np.where(np.diff(arr[:,1]))[0]+1)
[array([[  1.00000000e+00,   3.00000000e+00,   1.00000000e+00,
          4.27338430e+05,   4.84848943e+06],
       [  1.00000000e+00,   3.00000000e+00,   2.00000000e+00,
          4.27344794e+05,   4.84848207e+06],
       [  1.00000000e+00,   3.00000000e+00,   3.00000000e+00,
          4.27346430e+05,   4.84847275e+06]]),
 array([[  1.00000000e+00,   1.00000000e+00,   7.08400000e+03,
          4.27345271e+05,   4.84879659e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08500000e+03,
          4.27352928e+05,   4.84879094e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08600000e+03,
          4.27359161e+05,   4.84878743e+06]])]

Explanation:

解释:

Here first we are going to fetch the items in the second 2 column:

首先,我们要取回第二列的项目:

>>> arr[:,1]
array([ 3.,  3.,  3.,  1.,  1.,  1.])

Now to find out where the items actually change we can use numpy.diff:

现在,为了找出物品真正的变化,我们可以使用numpy.diff:

>>> np.diff(arr[:,1])
array([ 0.,  0., -2.,  0.,  0.])

Any thing non-zero means that the item next to it was different, we can use numpy.where to find the indices of non-zero items and then add 1 to it because the actual index of such item is one more than the returned index:

任何非零都意味着它旁边的项是不同的,我们可以用numpy。在哪里找到非零项的索引,然后加上1,因为该项的实际索引比返回的索引多1个:

>>> np.where(np.diff(arr[:,1]))[0]+1
array([3])