使用索引列表切割n维numpy数组

时间:2022-08-11 21:42:48

Say I have a 3 dimensional numpy array:

说我有一个3维numpy数组:

np.random.seed(1145)
A = np.random.random((5,5,5))

and I have two lists of indices corresponding to the 2nd and 3rd dimensions:

我有两个与第二和第三维对应的索引列表:

second = [1,2]
third = [3,4]

and I want to select the elements in the numpy array corresponding to

我想选择numpy数组中对应的元素

A[:][second][third]

so the shape of the sliced array would be (5,2,2) and

所以切片阵列的形状为(5,2,2)和

A[:][second][third].flatten()

would be equivalent to to:

相当于:

In [226]:

for i in range(5):
    for j in second:
        for k in third:
            print A[i][j][k]

0.556091074129
0.622016249651
0.622530505868
0.914954716368
0.729005532319
0.253214472335
0.892869371179
0.98279375528
0.814240066639
0.986060321906
0.829987410941
0.776715489939
0.404772469431
0.204696635072
0.190891168574
0.869554447412
0.364076117846
0.04760811817
0.440210532601
0.981601369658

Is there a way to slice a numpy array in this way? So far when I try A[:][second][third] I get IndexError: index 3 is out of bounds for axis 0 with size 2 because the [:] for the first dimension seems to be ignored.

有没有办法以这种方式切割numpy数组?到目前为止,当我尝试A [:] [second] [third]时,我得到IndexError:索引3超出了轴0的大小为2,因为第一维的[:]似乎被忽略了。

3 个解决方案

#1


8  

Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].

Numpy使用多个索引,因此您可以 - 而且应该 - 使用A [1,2,3]而不是A [1] [2] [3]。

You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).

你可能认为你可以做A [:,第二,第三],但是numpy索引是广播的,广播第二和第三(两个一维序列)最终是zip的numpy等价物,所以结果有形( 5,2)。

What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).

你真正想要的是实际上使用第二和第三的外积进行索引。您可以通过制作其中一个广播来实现这一点,然后再将其作为具有形状(2,1)的二维数组。然后,由第二和第三广播一起产生的形状是(2,2)。

For example:

例如:

In [8]: import numpy as np

In [9]: a = np.arange(125).reshape(5,5,5)

In [10]: second = [1,2]

In [11]: third = [3,4]

In [12]: s = a[:, np.array(second).reshape(-1,1), third]

In [13]: s.shape
Out[13]: (5, 2, 2)

Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:

注意,在该具体示例中,第二和第三中的值是顺序的。如果这是典型的,您可以简单地使用切片:

In [14]: s2 = a[:, 1:3, 3:5]

In [15]: s2.shape
Out[15]: (5, 2, 2)

In [16]: np.all(s == s2)
Out[16]: True

There are a couple very important difference in those two methods.

这两种方法有两个非常重要的区别。

  • The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
  • 第一种方法也适用于与切片不等效的索引。例如,如果second = [0,2,3],它将起作用。 (有时你会看到这种索引方式被称为“花式索引”。)
  • In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
  • 在第一种方法(使用广播和“花式索引”)中,数据是原始数组的副本。在第二种方法(仅使用切片)中,数组s2是a使用的同一内存块的视图。一个就地改变将改变它们。

#2


3  

One way would be to use np.ix_:

一种方法是使用np.ix_:

>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True

Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.

缺点是您必须明确指定缺少的坐标范围,但您可以将其包装到函数中。

#3


1  

I think there are three problems with your approach:

我认为你的方法有三个问题:

  1. Both second and third should be slices
  2. 第二和第三都应该是切片
  3. Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
  4. 由于'to'索引是独占的,因此它们应该从1到3,从3到5
  5. Instead of A[:][second][third], you should use A[:,second,third]
  6. 而不是A [:] [second] [third],你应该使用A [:,second,third]

Try this:

尝试这个:

>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))                       
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482,  0.80820122,  0.64878266,  0.62689481,  0.01298507,
        0.42112921,  0.23104051,  0.34601169,  0.24838564,  0.66162209,
        0.96115751,  0.07338851,  0.33109539,  0.55168356,  0.33925748,
        0.2353348 ,  0.91254398,  0.44692211,  0.60975602,  0.64610556])

#1


8  

Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].

Numpy使用多个索引,因此您可以 - 而且应该 - 使用A [1,2,3]而不是A [1] [2] [3]。

You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).

你可能认为你可以做A [:,第二,第三],但是numpy索引是广播的,广播第二和第三(两个一维序列)最终是zip的numpy等价物,所以结果有形( 5,2)。

What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).

你真正想要的是实际上使用第二和第三的外积进行索引。您可以通过制作其中一个广播来实现这一点,然后再将其作为具有形状(2,1)的二维数组。然后,由第二和第三广播一起产生的形状是(2,2)。

For example:

例如:

In [8]: import numpy as np

In [9]: a = np.arange(125).reshape(5,5,5)

In [10]: second = [1,2]

In [11]: third = [3,4]

In [12]: s = a[:, np.array(second).reshape(-1,1), third]

In [13]: s.shape
Out[13]: (5, 2, 2)

Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:

注意,在该具体示例中,第二和第三中的值是顺序的。如果这是典型的,您可以简单地使用切片:

In [14]: s2 = a[:, 1:3, 3:5]

In [15]: s2.shape
Out[15]: (5, 2, 2)

In [16]: np.all(s == s2)
Out[16]: True

There are a couple very important difference in those two methods.

这两种方法有两个非常重要的区别。

  • The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
  • 第一种方法也适用于与切片不等效的索引。例如,如果second = [0,2,3],它将起作用。 (有时你会看到这种索引方式被称为“花式索引”。)
  • In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
  • 在第一种方法(使用广播和“花式索引”)中,数据是原始数组的副本。在第二种方法(仅使用切片)中,数组s2是a使用的同一内存块的视图。一个就地改变将改变它们。

#2


3  

One way would be to use np.ix_:

一种方法是使用np.ix_:

>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True

Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.

缺点是您必须明确指定缺少的坐标范围,但您可以将其包装到函数中。

#3


1  

I think there are three problems with your approach:

我认为你的方法有三个问题:

  1. Both second and third should be slices
  2. 第二和第三都应该是切片
  3. Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
  4. 由于'to'索引是独占的,因此它们应该从1到3,从3到5
  5. Instead of A[:][second][third], you should use A[:,second,third]
  6. 而不是A [:] [second] [third],你应该使用A [:,second,third]

Try this:

尝试这个:

>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))                       
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482,  0.80820122,  0.64878266,  0.62689481,  0.01298507,
        0.42112921,  0.23104051,  0.34601169,  0.24838564,  0.66162209,
        0.96115751,  0.07338851,  0.33109539,  0.55168356,  0.33925748,
        0.2353348 ,  0.91254398,  0.44692211,  0.60975602,  0.64610556])