numpy order数组切片索引如何?

时间:2021-04-15 21:34:43

I have an np.array data of shape (28,8,20), and I only need certain entries from it, so I'm taking a slice:

我有一个形状的np.array数据(28,8,20),我只需要它的某些条目,所以我正在切片:

In [41]: index = np.array([ 5,  6,  7,  8,  9, 10, 11, 17, 18, 19])
In [42]: extract = data[:,:,index]
In [43]: extract.shape
Out[43]: (28, 8, 10)

So far so good, everything as it should be. But now I wand to look at just the first two entries on the last index for the first line:

到目前为止一切都那么好,一切都应该如此。但是现在我只想看看第一行最后一个索引的前两个条目:

In [45]: extract[0,:,np.array([0,1])].shape
Out[45]: (2, 8)

Wait, that should be (8,2). It switched the indices around, even though it did not when I sliced the last time! According to my understanding, the following should act the same way:

等等,应该是(8,2)。它改变了指数,即使我最后一次切片时没有!根据我的理解,以下应采取相同的方式:

In [46]: extract[0,:,:2].shape
Out[46]: (8, 2)

... but it gives me exactly what I wanted! As long as I have a 3D-array, though, both methods seem to be equivalent:

......但它给了我我想要的东西!但是,只要我有一个3D数组,这两种方法似乎都是等价的:

In [47]: extract[:,:,np.array([0,1])].shape
Out[47]: (28, 8, 2)

In [48]: extract[:,:,:2].shape
Out[48]: (28, 8, 2)

So what do I do if I want not just the first two entries but an irregular list? I could of course transpose the matrix after the operation but this seems very counter-intuitive. A better solution to my problem is this (though there might be a more elegant one):

那么,如果我不仅需要前两个条目而且需要不规则列表,我该怎么办?我当然可以在操作后转置矩阵,但这似乎非常违反直觉。我的问题的一个更好的解决方案是这个(尽管可能有更优雅的一个):

In [64]: extract[0][:,[0,1]].shape
Out[64]: (8, 2)

Which brings us to the actual

这把我们带到了实际

question:

I wonder what the reason for this behaviour is? Whoever decided that this is how it should work probably knew more about programming than I do and thought that this is consistent in some way that I am entirely missing. And I will likely keep hitting my head on this unless I have a way to make sense of it.

我想知道这种行为的原因是什么?无论谁决定它应该如何工作,可能比我更了解编程,并认为这在某种程度上是一致的,我完全没有。除非我有办法理解它,否则我可能会继续关注这个问题。

2 个解决方案

#1


5  

This is a case of (advanced) partial indexing. There are 2 indexed arrays, and 1 slice

这是(高级)部分索引的情况。有2个索引数组和1个切片

If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x.

如果索引子空间是分开的(通过切片对象),则首先是广播的索引空间,然后是x的切片子空间。

http://docs.scipy.org/doc/numpy-1.8.1/reference/arrays.indexing.html#advanced-indexing

http://docs.scipy.org/doc/numpy-1.8.1/reference/arrays.indexing.html#advanced-indexing

The advanced indexing example notes, when the ind_1, ind_2 broadcastable subspace is shape (2,3,4) that:

高级索引示例注意到,当ind_1,ind_2可广播子空间的形状(2,3,4)表示:

However, x[:,ind_1,:,ind_2] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always possible to use .transpose() to move the subspace anywhere desired.

但是,x [:,ind_1,:,ind_2]具有形状(2,3,4,10,30,50),因为在索引子空间中没有明确的位置,因此它被添加到开头。始终可以使用.transpose()在任何需要的位置移动子空间。

In other words, this indexing is not the same as x[:, ind_1][[:,ind_2]. The 2 arrays operate jointly to define a (2,3,4) subspace.

换句话说,这个索引与x [:,ind_1] [[:,ind_2]不同。 2个阵列共同操作以定义(2,3,4)子空间。

In your example, extract[0,:,np.array([0,1])] is understood to mean, select a (2,) subspace (the [0] and [0,1] act jointly, not sequentially), and combine that in some way with the middle dimension.

在你的例子中,提取[0,:,np.array([0,1])]被理解为意味着,选择一个(2,)子空间([0]和[0,1]共同行动,而不是顺序) ,并以某种方式将其与中间维度相结合。

A more elaborate example would be extract[[1,0],:,[[0,1],[1,0]]], which produces a (2,2,8) array. This is a (2,2) subspace of the 1st and last dimensions, plus the middle one. On the other hand, X[[1,0]][:,:,[[0,1],[1,0]]] produces a (2,8,2,2), selecting from the 1st and last dimensions separately.

一个更精细的例子是提取[[1,0],:,[[0,1],[1,0]]],它产生一个(2,2,8)数组。这是第一维和最后一维的(2,2)子空间,加上中间维。另一方面,X [[1,0]] [:,:,[[0,1],[1,0]]]产生一个(2,8,2,2),从第一个和最后一个选择尺寸分开。

The key difference is whether the indexed selections operate sequential or jointly. The `[...][...] syntax is already available to operate sequentially. Advanced indexing gives you a way indexing jointly.

关键的区别在于索引选择是顺序操作还是联合操作。 [...] [...]语法已经可以按顺序运行。高级索引为您提供了一种联合索引方式。

#2


3  

You're right, that's weird. I can only hazard a guess here. I think it's related to the fact that a[[0,1],[0,1],[0,1]].shape is (2,) rather than (2,2,2) and that a[0,1,[0,1,2]] really means a[[0,0,0],[1,1,1],[0,1,2]] which evaluates to array([a[0,1,0],a[0,1,1],a[0,1,2]]). That is, you step through lists-as-indices for each dimension in parallel, with length-one lists and scalars being broadcast to match the longest.

你是对的,这很奇怪。我只能冒这个猜测。我认为这与[[0,1],[0,1],[0,1]]。形状是(2,)而不是(2,2,2)并且a [0, 1,[0,1,2]]实际上意味着[[0,0,0],[1,1,1],[0,1,2]],它的计算结果为([a [0,1, 0],A [0,1,1],A [0,1,2]])。也就是说,您逐步浏览每个维度的列表 - 索引,其中长度为一的列表和标量被广播以匹配最长的。

Conceptually, that would make your extract[0,:,[0,1]] equivalent to extract[[0,0],[slice(None),slice(None)],[0,1]] (that syntax isn't accepted if you specify it manually, though). After stepping through the indices, that would evaluate to array([extract[0,slice(None),0],extract[0,slice(None),1]). Each of the inner extracts evaluate to a shape (8,) array, so the full result is shape (2,8).

从概念上讲,这将使你的提取[0,:,[0,1]]等同于提取[[0,0],[slice(None),slice(None)],[0,1]](该语法不是但是,如果您手动指定它,则不会接受。单步执行索引后,将评估为数组([extract [0,slice(None),0],extract [0,slice(None),1])。每个内部提取都评估为一个形状(8,)数组,因此完整的结果是形状(2,8)。

So to conclude I think it is a side-effect of the broadcasting that is done to make all the dimensions have an index list of the same length, which leads to : being broadcast too. That is my hypothesis, but I haven't looked at the inner workings of how numpy does this. Perhaps an expert will come along with a better explanation.

总而言之,我认为广播的副作用是使所有维度都具有相同长度的索引列表,这导致:广播也是如此。这是我的假设,但我还没有看到numpy如何做到这一点的内部运作。也许专家会提出更好的解释。

This hypothesis does not explain why extract[:,:,[0,1]] does not result in the same behavior. I would have to postulate that the case of only leading ":" being special-cased to avoid participating in the list index logic.

这个假设并不能解释为什么提取物[:,:,[0,1]]不会导致相同的行为。我必须假设只有前导“:”的情况是特殊的,以避免参与列表索引逻辑。

#1


5  

This is a case of (advanced) partial indexing. There are 2 indexed arrays, and 1 slice

这是(高级)部分索引的情况。有2个索引数组和1个切片

If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x.

如果索引子空间是分开的(通过切片对象),则首先是广播的索引空间,然后是x的切片子空间。

http://docs.scipy.org/doc/numpy-1.8.1/reference/arrays.indexing.html#advanced-indexing

http://docs.scipy.org/doc/numpy-1.8.1/reference/arrays.indexing.html#advanced-indexing

The advanced indexing example notes, when the ind_1, ind_2 broadcastable subspace is shape (2,3,4) that:

高级索引示例注意到,当ind_1,ind_2可广播子空间的形状(2,3,4)表示:

However, x[:,ind_1,:,ind_2] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always possible to use .transpose() to move the subspace anywhere desired.

但是,x [:,ind_1,:,ind_2]具有形状(2,3,4,10,30,50),因为在索引子空间中没有明确的位置,因此它被添加到开头。始终可以使用.transpose()在任何需要的位置移动子空间。

In other words, this indexing is not the same as x[:, ind_1][[:,ind_2]. The 2 arrays operate jointly to define a (2,3,4) subspace.

换句话说,这个索引与x [:,ind_1] [[:,ind_2]不同。 2个阵列共同操作以定义(2,3,4)子空间。

In your example, extract[0,:,np.array([0,1])] is understood to mean, select a (2,) subspace (the [0] and [0,1] act jointly, not sequentially), and combine that in some way with the middle dimension.

在你的例子中,提取[0,:,np.array([0,1])]被理解为意味着,选择一个(2,)子空间([0]和[0,1]共同行动,而不是顺序) ,并以某种方式将其与中间维度相结合。

A more elaborate example would be extract[[1,0],:,[[0,1],[1,0]]], which produces a (2,2,8) array. This is a (2,2) subspace of the 1st and last dimensions, plus the middle one. On the other hand, X[[1,0]][:,:,[[0,1],[1,0]]] produces a (2,8,2,2), selecting from the 1st and last dimensions separately.

一个更精细的例子是提取[[1,0],:,[[0,1],[1,0]]],它产生一个(2,2,8)数组。这是第一维和最后一维的(2,2)子空间,加上中间维。另一方面,X [[1,0]] [:,:,[[0,1],[1,0]]]产生一个(2,8,2,2),从第一个和最后一个选择尺寸分开。

The key difference is whether the indexed selections operate sequential or jointly. The `[...][...] syntax is already available to operate sequentially. Advanced indexing gives you a way indexing jointly.

关键的区别在于索引选择是顺序操作还是联合操作。 [...] [...]语法已经可以按顺序运行。高级索引为您提供了一种联合索引方式。

#2


3  

You're right, that's weird. I can only hazard a guess here. I think it's related to the fact that a[[0,1],[0,1],[0,1]].shape is (2,) rather than (2,2,2) and that a[0,1,[0,1,2]] really means a[[0,0,0],[1,1,1],[0,1,2]] which evaluates to array([a[0,1,0],a[0,1,1],a[0,1,2]]). That is, you step through lists-as-indices for each dimension in parallel, with length-one lists and scalars being broadcast to match the longest.

你是对的,这很奇怪。我只能冒这个猜测。我认为这与[[0,1],[0,1],[0,1]]。形状是(2,)而不是(2,2,2)并且a [0, 1,[0,1,2]]实际上意味着[[0,0,0],[1,1,1],[0,1,2]],它的计算结果为([a [0,1, 0],A [0,1,1],A [0,1,2]])。也就是说,您逐步浏览每个维度的列表 - 索引,其中长度为一的列表和标量被广播以匹配最长的。

Conceptually, that would make your extract[0,:,[0,1]] equivalent to extract[[0,0],[slice(None),slice(None)],[0,1]] (that syntax isn't accepted if you specify it manually, though). After stepping through the indices, that would evaluate to array([extract[0,slice(None),0],extract[0,slice(None),1]). Each of the inner extracts evaluate to a shape (8,) array, so the full result is shape (2,8).

从概念上讲,这将使你的提取[0,:,[0,1]]等同于提取[[0,0],[slice(None),slice(None)],[0,1]](该语法不是但是,如果您手动指定它,则不会接受。单步执行索引后,将评估为数组([extract [0,slice(None),0],extract [0,slice(None),1])。每个内部提取都评估为一个形状(8,)数组,因此完整的结果是形状(2,8)。

So to conclude I think it is a side-effect of the broadcasting that is done to make all the dimensions have an index list of the same length, which leads to : being broadcast too. That is my hypothesis, but I haven't looked at the inner workings of how numpy does this. Perhaps an expert will come along with a better explanation.

总而言之,我认为广播的副作用是使所有维度都具有相同长度的索引列表,这导致:广播也是如此。这是我的假设,但我还没有看到numpy如何做到这一点的内部运作。也许专家会提出更好的解释。

This hypothesis does not explain why extract[:,:,[0,1]] does not result in the same behavior. I would have to postulate that the case of only leading ":" being special-cased to avoid participating in the list index logic.

这个假设并不能解释为什么提取物[:,:,[0,1]]不会导致相同的行为。我必须假设只有前导“:”的情况是特殊的,以避免参与列表索引逻辑。