numpy order数组切片索引如何?

时间:2021-04-15 21:34:43

I have an np.array data of shape (28,8,20), and I only need certain entries from it, so I'm taking a slice:


In [41]: index = np.array([ 5,  6,  7,  8,  9, 10, 11, 17, 18, 19])
In [42]: extract = data[:,:,index]
In [43]: extract.shape
Out[43]: (28, 8, 10)

So far so good, everything as it should be. But now I wand to look at just the first two entries on the last index for the first line:


In [45]: extract[0,:,np.array([0,1])].shape
Out[45]: (2, 8)

Wait, that should be (8,2). It switched the indices around, even though it did not when I sliced the last time! According to my understanding, the following should act the same way:


In [46]: extract[0,:,:2].shape
Out[46]: (8, 2)

... but it gives me exactly what I wanted! As long as I have a 3D-array, though, both methods seem to be equivalent:


In [47]: extract[:,:,np.array([0,1])].shape
Out[47]: (28, 8, 2)

In [48]: extract[:,:,:2].shape
Out[48]: (28, 8, 2)

So what do I do if I want not just the first two entries but an irregular list? I could of course transpose the matrix after the operation but this seems very counter-intuitive. A better solution to my problem is this (though there might be a more elegant one):


In [64]: extract[0][:,[0,1]].shape
Out[64]: (8, 2)

Which brings us to the actual



I wonder what the reason for this behaviour is? Whoever decided that this is how it should work probably knew more about programming than I do and thought that this is consistent in some way that I am entirely missing. And I will likely keep hitting my head on this unless I have a way to make sense of it.


2 个解决方案



This is a case of (advanced) partial indexing. There are 2 indexed arrays, and 1 slice


If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x.


The advanced indexing example notes, when the ind_1, ind_2 broadcastable subspace is shape (2,3,4) that:


However, x[:,ind_1,:,ind_2] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always possible to use .transpose() to move the subspace anywhere desired.

但是,x [:,ind_1,:,ind_2]具有形状(2,3,4,10,30,50),因为在索引子空间中没有明确的位置,因此它被添加到开头。始终可以使用.transpose()在任何需要的位置移动子空间。

In other words, this indexing is not the same as x[:, ind_1][[:,ind_2]. The 2 arrays operate jointly to define a (2,3,4) subspace.

换句话说,这个索引与x [:,ind_1] [[:,ind_2]不同。 2个阵列共同操作以定义(2,3,4)子空间。

In your example, extract[0,:,np.array([0,1])] is understood to mean, select a (2,) subspace (the [0] and [0,1] act jointly, not sequentially), and combine that in some way with the middle dimension.

在你的例子中,提取[0,:,np.array([0,1])]被理解为意味着,选择一个(2,)子空间([0]和[0,1]共同行动,而不是顺序) ,并以某种方式将其与中间维度相结合。

A more elaborate example would be extract[[1,0],:,[[0,1],[1,0]]], which produces a (2,2,8) array. This is a (2,2) subspace of the 1st and last dimensions, plus the middle one. On the other hand, X[[1,0]][:,:,[[0,1],[1,0]]] produces a (2,8,2,2), selecting from the 1st and last dimensions separately.

一个更精细的例子是提取[[1,0],:,[[0,1],[1,0]]],它产生一个(2,2,8)数组。这是第一维和最后一维的(2,2)子空间,加上中间维。另一方面,X [[1,0]] [:,:,[[0,1],[1,0]]]产生一个(2,8,2,2),从第一个和最后一个选择尺寸分开。

The key difference is whether the indexed selections operate sequential or jointly. The `[...][...] syntax is already available to operate sequentially. Advanced indexing gives you a way indexing jointly.

关键的区别在于索引选择是顺序操作还是联合操作。 [...] [...]语法已经可以按顺序运行。高级索引为您提供了一种联合索引方式。



You're right, that's weird. I can only hazard a guess here. I think it's related to the fact that a[[0,1],[0,1],[0,1]].shape is (2,) rather than (2,2,2) and that a[0,1,[0,1,2]] really means a[[0,0,0],[1,1,1],[0,1,2]] which evaluates to array([a[0,1,0],a[0,1,1],a[0,1,2]]). That is, you step through lists-as-indices for each dimension in parallel, with length-one lists and scalars being broadcast to match the longest.

你是对的,这很奇怪。我只能冒这个猜测。我认为这与[[0,1],[0,1],[0,1]]。形状是(2,)而不是(2,2,2)并且a [0, 1,[0,1,2]]实际上意味着[[0,0,0],[1,1,1],[0,1,2]],它的计算结果为([a [0,1, 0],A [0,1,1],A [0,1,2]])。也就是说,您逐步浏览每个维度的列表 - 索引,其中长度为一的列表和标量被广播以匹配最长的。

Conceptually, that would make your extract[0,:,[0,1]] equivalent to extract[[0,0],[slice(None),slice(None)],[0,1]] (that syntax isn't accepted if you specify it manually, though). After stepping through the indices, that would evaluate to array([extract[0,slice(None),0],extract[0,slice(None),1]). Each of the inner extracts evaluate to a shape (8,) array, so the full result is shape (2,8).

从概念上讲,这将使你的提取[0,:,[0,1]]等同于提取[[0,0],[slice(None),slice(None)],[0,1]](该语法不是但是,如果您手动指定它,则不会接受。单步执行索引后,将评估为数组([extract [0,slice(None),0],extract [0,slice(None),1])。每个内部提取都评估为一个形状(8,)数组,因此完整的结果是形状(2,8)。

So to conclude I think it is a side-effect of the broadcasting that is done to make all the dimensions have an index list of the same length, which leads to : being broadcast too. That is my hypothesis, but I haven't looked at the inner workings of how numpy does this. Perhaps an expert will come along with a better explanation.


This hypothesis does not explain why extract[:,:,[0,1]] does not result in the same behavior. I would have to postulate that the case of only leading ":" being special-cased to avoid participating in the list index logic.




