一次从numpy数组中选择多个切片

I'm looking for a way to select multiple slices from a numpy array at once. Say we have a 1D data array and want to extract three portions of it like below:

我正在寻找一种方法从一个numpy数组中一次选择多个切片。假设我们有一维数据阵列,并想要提取它的三个部分,如下所示:

data_extractions = []

for start_index in range(0, 3):
    data_extractions.append(data[start_index: start_index + 5])

Afterwards data_extractions will be:

之后data_extractions将是:

data_extractions = [
    data[0:5],
    data[1:6],
    data[2:7]
]

Is there any way to perform above operation without the for loop? Some sort of indexing scheme in numpy that would let me select multiple slices from an array and return them as that many arrays, say in an n+1 dimensional array?

有没有办法在没有for循环的情况下执行上述操作? numpy中的某种索引方案可以让我从数组中选择多个切片并将它们作为多个数组返回,例如在n + 1维数组中?

I thought maybe I can replicate my data and then select a span from each row, but code below throws an IndexError

我想也许我可以复制我的数据,然后从每一行中选择一个范围,但下面的代码会抛出一个IndexError

replicated_data = np.vstack([data] * 3)
data_extractions = replicated_data[[range(3)], [slice(0, 5), slice(1, 6), slice(2, 7)]

6 个解决方案

#1

You can use the the indexes to select you rows you want into the appropriate shape. For example:

您可以使用索引将所需的行选择为适当的形状。例如:

 data = np.random.normal(size=(100,2,2,2))

 # Creating an array of row-indexes
 indexes = np.array([np.arange(0,5), np.arange(1,6), np.arange(2,7)])
 # data[indexes] will return an element of shape (3,5,2,2,2). Converting
 # to list happens along axis 0
 data_extractions = list(data[indexes])

 np.all(data_extractions[1] == s[1:6])
 True

#2

In this post is an approach with strided-indexing scheme using np.lib.stride_tricks.as_strided that basically creates a view into the input array and as such is pretty efficient for creation and being a view occupies nomore memory space. Also, this works for ndarrays with generic number of dimensions.

在这篇文章中是一种使用跨步索引方案的方法,使用np.lib.stride_tricks.as_strided,它基本上创建了一个输入数组的视图,因此对于创建非常有效,并且视图占用了内存空间。此外,这适用于具有通用维数的ndarray。

Here's the implementation -

这是实施 -

def strided_axis0(a, L):
    # Store the shape and strides info
    shp = a.shape
    s  = a.strides

    # Compute length of output array along the first axis
    nd0 = shp[0]-L+1

    # Setup shape and strides for use with np.lib.stride_tricks.as_strided
    # and get (n+1) dim output array
    shp_in = (nd0,L)+shp[1:]
    strd_in = (s[0],) + s
    return np.lib.stride_tricks.as_strided(a, shape=shp_in, strides=strd_in)

Sample run for a 4D array case -

针对4D阵列案例的示例运行 -

In [44]: a = np.random.randint(11,99,(10,4,2,3)) # Array

In [45]: L = 5      # Window length along the first axis

In [46]: out = strided_axis0(a, L)

In [47]: np.allclose(a[0:L], out[0])  # Verify outputs
Out[47]: True

In [48]: np.allclose(a[1:L+1], out[1])
Out[48]: True

In [49]: np.allclose(a[2:L+2], out[2])
Out[49]: True

#3

You can slice your array with a prepared slicing array

您可以使用准备好的切片阵列切割数组

a = np.array(list('abcdefg'))

b = np.array([
        [0, 1, 2, 3, 4],
        [1, 2, 3, 4, 5],
        [2, 3, 4, 5, 6]
    ])

a[b]

However, b doesn't have to generated by hand in this way. It can be more dynamic with

但是,b不必以这种方式手工生成。它可以更有活力

b = np.arange(5) + np.arange(3)[:, None]

#4

stride_tricks can do that

stride_tricks可以做到这一点

a = np.arange(10)
b = np.lib.stride_tricks.as_strided(a, (3, 5), 2 * a.strides)
b
# array([[0, 1, 2, 3, 4],
#        [1, 2, 3, 4, 5],
#        [2, 3, 4, 5, 6]])

Please note that b references the same memory as a, in fact multiple times (for example b[0, 1] and b[1, 0] are the same memory address). It is therefore safest to make a copy before working with the new structure.

请注意,b引用的内存与a相同,实际上是多次(例如b [0,1]和b [1,0]是相同的内存地址)。因此,在使用新结构之前制作副本是最安全的。

nd can be done in a similar fashion, for example 2d -> 4d

nd可以以类似的方式完成,例如2d - > 4d

a = np.arange(16).reshape(4, 4)
b = np.lib.stride_tricks.as_strided(a, (3,3,2,2), 2*a.strides)
b.reshape(9,2,2) # this forces a copy
# array([[[ 0,  1],
#         [ 4,  5]],

#        [[ 1,  2],
#         [ 5,  6]],

#        [[ 2,  3],
#         [ 6,  7]],

#        [[ 4,  5],
#         [ 8,  9]],

#        [[ 5,  6],
#         [ 9, 10]],

#        [[ 6,  7],
#         [10, 11]],

#        [[ 8,  9],
#         [12, 13]],

#        [[ 9, 10],
#         [13, 14]],

#        [[10, 11],
#         [14, 15]]])

#5

In the general case you have to do some sort of iteration - and concatenation - either when constructing the indexes or when collecting the results. It's only when the slicing pattern is itself regular that you can use a generalized slicing via as_strided.

在一般情况下,您必须进行某种迭代 - 和连接 - 在构建索引或收集结果时。只有当切片模式本身是常规的时,才可以通过as_strided使用广义切片。

The accepted answer constructs an indexing array, one row per slice. So that is iterating over the slices, and arange itself is a (fast) iteration. And np.array concatenates them on a new axis (np.stack generalizes this).

接受的答案构造索引数组,每个切片一行。所以这是迭代切片,而arange本身就是(快速)迭代。并且np.array在新轴上连接它们(np.stack概括了这个)。

In [264]: np.array([np.arange(0,5), np.arange(1,6), np.arange(2,7)])
Out[264]: 
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6]])

indexing_tricks convenience methods to do the same thing:

indexing_tricks方便的方法做同样的事情:

In [265]: np.r_[0:5, 1:6, 2:7]
Out[265]: array([0, 1, 2, 3, 4, 1, 2, 3, 4, 5, 2, 3, 4, 5, 6])

This takes the slicing notation, expands it with arange and concatenates. It even lets me expand and concatenate into 2d

这采用切片表示法,使用arange和concatenates扩展它。它甚至可以让我扩展并连接到2d

In [269]: np.r_['0,2',0:5, 1:6, 2:7]
Out[269]: 
array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6]])

In [270]: data=np.array(list('abcdefghijk'))
In [272]: data[np.r_['0,2',0:5, 1:6, 2:7]]
Out[272]: 
array([['a', 'b', 'c', 'd', 'e'],
       ['b', 'c', 'd', 'e', 'f'],
       ['c', 'd', 'e', 'f', 'g']], 
      dtype='<U1')
In [273]: data[np.r_[0:5, 1:6, 2:7]]
Out[273]: 
array(['a', 'b', 'c', 'd', 'e', 'b', 'c', 'd', 'e', 'f', 'c', 'd', 'e',
       'f', 'g'], 
      dtype='<U1')

Concatenating results after indexing also works.

索引后连接结果也有效。

In [274]: np.stack([data[0:5],data[1:6],data[2:7]])

My memory from other SO questions is that relative timings are in the same order of magnitude. It may vary for example with the number of slices versus their length. Overall the number of values that have to be copied from source to target will be the same.

我对其他SO问题的记忆是相对时间处于同一数量级。它可以例如随着切片的数量与它们的长度而变化。总的来说,必须从源复制到目标的值的数量将是相同的。

If the slices vary in length, you'd have to use the flat indexing.

如果切片的长度不同,则必须使用平面索引。

#6

We can use list comprehension for this

我们可以使用列表理解

data=np.array([1,2,3,4,5,6,7,8,9,10])
data_extractions=[data[b:b+5] for b in [1,2,3,4,5]]
data_extractions

Results

[array([2, 3, 4, 5, 6]), array([3, 4, 5, 6, 7]), array([4, 5, 6, 7, 8]), array([5, 6, 7, 8, 9]), array([ 6,  7,  8,  9, 10])]

#1