1D阵列的高效Numpy 2D阵列构造

I have an array like this:

我有一个像这样的数组：

A = array([1,2,3,4,5,6,7,8,9,10])

And I am trying to get an array like this:

我试图得到这样的数组：

B = array([[1,2,3],
          [2,3,4],
          [3,4,5],
          [4,5,6]])

Where each row (of a fixed arbitrary width) is shifted by one. The array of A is 10k records long and I'm trying to find an efficient way of doing this in Numpy. Currently I am using vstack and a for loop which is slow. Is there a faster way?

每行（固定任意宽度）移动一行。 A的数组是10k记录长，我试图在Numpy中找到一种有效的方法。目前我正在使用vstack和一个缓慢的for循环。有更快的方法吗？

Edit:

编辑：

width = 3 # fixed arbitrary width
length = 10000 # length of A which I wish to use
B = A[0:length + 1]
for i in range (1, length):
    B = np.vstack((B, A[i, i + width + 1]))

7 个解决方案

#1

Actually, there's an even more efficient way to do this... The downside to using vstack etc, is that you're making a copy of the array.

实际上，有一个更有效的方法来做到这一点...使用vstack等的缺点是你正在制作数组的副本。

Incidentally, this is effectively identical to @Paul's answer, but I'm posting this just to explain things in a bit more detail...

顺便提一下，这与@Paul的答案实际上是一致的，但我发布的内容只是为了更详细地解释一下......

There's a way to do this with just views so that no memory is duplicated.

有一种方法可以只使用视图来执行此操作，以便不会重复内存。

I'm directly borrowing this from Erik Rigtorp's post to numpy-discussion, who in turn, borrowed it from Keith Goodman's Bottleneck (Which is quite useful!).

我直接从Erik Rigtorp的帖子中借用这个来讨论numpy讨论，后者又从Keith Goodman的瓶颈中借用它（这非常有用！）。

The basic trick is to directly manipulate the strides of the array (For one-dimensional arrays):

基本技巧是直接操纵数组的步幅（对于一维数组）：

import numpy as np

def rolling(a, window):
    shape = (a.size - window + 1, window)
    strides = (a.itemsize, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(10)
print rolling(a, 3)

Where a is your input array and window is the length of the window that you want (3, in your case).

其中a是您的输入数组，窗口是您想要的窗口长度（在您的情况下为3）。

This yields:

这会产生：

[[0 1 2]
 [1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]
 [5 6 7]
 [6 7 8]
 [7 8 9]]

However, there is absolutely no duplication of memory between the original a and the returned array. This means that it's fast and scales much better than other options.

但是，原始a和返回的数组之间绝对没有重复的内存。这意味着它比其他选项更快，更好地扩展。

For example (using a = np.arange(100000) and window=3):

例如（使用a = np.arange（100000）和window = 3）：

%timeit np.vstack([a[i:i-window] for i in xrange(window)]).T
1000 loops, best of 3: 256 us per loop

%timeit rolling(a, window)
100000 loops, best of 3: 12 us per loop

If we generalize this to a "rolling window" along the last axis for an N-dimensional array, we get Erik Rigtorp's "rolling window" function:

如果我们将这个概括为沿着N维数组的最后一个轴的“滚动窗口”，我们得到了Erik Rigtorp的“滚动窗口”功能：

import numpy as np

def rolling_window(a, window):
   """
   Make an ndarray with a rolling window of the last dimension

   Parameters
   ----------
   a : array_like
       Array to add rolling window to
   window : int
       Size of rolling window

   Returns
   -------
   Array that is a view of the original array with a added dimension
   of size w.

   Examples
   --------
   >>> x=np.arange(10).reshape((2,5))
   >>> rolling_window(x, 3)
   array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
          [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

   Calculate rolling mean of last dimension:
   >>> np.mean(rolling_window(x, 3), -1)
   array([[ 1.,  2.,  3.],
          [ 6.,  7.,  8.]])

   """
   if window < 1:
       raise ValueError, "`window` must be at least 1."
   if window > a.shape[-1]:
       raise ValueError, "`window` is too long."
   shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   strides = a.strides + (a.strides[-1],)
   return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

So, let's look into what's going on here... Manipulating an array's strides may seem a bit magical, but once you understand what's going on, it's not at all. The strides of a numpy array describe the size in bytes of the steps that must be taken to increment one value along a given axis. So, in the case of a 1-dimensional array of 64-bit floats, the length of each item is 8 bytes, and x.strides is (8,).

所以，让我们看一下这里发生了什么......操纵数组的步伐可能看起来有点神奇，但是一旦你理解了正在发生的事情，它根本就不存在。 numpy数组的步幅描述了沿给定轴增加一个值必须采取的步骤的大小（以字节为单位）。因此，对于64位浮点数的一维数组，每个项的长度为8个字节，x.strides为（8，）。

x = np.arange(9)
print x.strides

Now, if we reshape this into a 2D, 3x3 array, the strides will be (3 * 8, 8), as we would have to jump 24 bytes to increment one step along the first axis, and 8 bytes to increment one step along the second axis.

现在，如果我们将其重塑为2D，3x3数组，则步幅将是（3 * 8,8），因为我们必须跳过24个字节以沿第一个轴增加一步，并且8个字节增加一步第二轴。

y = x.reshape(3,3)
print y.strides

Similarly a transpose is the same as just reversing the strides of an array:

类似地，转置与仅反转数组的步幅相同：

print y
y.strides = y.strides[::-1]
print y

Clearly, the strides of an array and the shape of an array are intimately linked. If we change one, we have to change the other accordingly, otherwise we won't have a valid description of the memory buffer that actually holds the values of the array.

显然，阵列的步幅和阵列的形状密切相关。如果我们改变一个，我们必须相应地改变另一个，否则我们将没有实际保存数组值的内存缓冲区的有效描述。

Therefore, if you want to change both the shape and size of an array simultaneously, you can't do it just by setting x.strides and x.shape, even if the new strides and shape are compatible.

因此，如果要同时更改数组的形状和大小，则只能通过设置x.strides和x.shape来实现，即使新的步幅和形状是兼容的。

That's where numpy.lib.as_strided comes in. It's actually a very simple function that just sets the strides and shape of an array simultaneously.

这就是numpy.lib.as_strided的用武之地。它实际上是一个非常简单的函数，它只是同时设置数组的步幅和形状。

It checks that the two are compatible, but not that the old strides and new shape are compatible, as would happen if you set the two independently. (It actually does this through numpy's __array_interface__, which allows arbitrary classes to describe a memory buffer as a numpy array.)

它检查两者是否兼容，但不是旧的步幅和新形状是兼容的，如果你单独设置两个就会发生。（它实际上是通过numpy的__array_interface__实现的，它允许任意类将内存缓冲区描述为numpy数组。）

So, all we've done is made it so that steps one item forward (8 bytes in the case of a 64-bit array) along one axis, but also only steps 8 bytes forward along the other axis.

因此，我们所做的就是沿着一个轴向前移动一个项目（在64位阵列的情况下为8个字节），但也只沿另一个轴向前移动8个字节。

In other words, in case of a "window" size of 3, the array has a shape of (whatever, 3), but instead of stepping a full 3 * x.itemsize for the second dimension, it only steps one item forward, effectively making the rows of new array a "moving window" view into the original array.

换句话说，在“窗口”大小为3的情况下，数组的形状为（无论如何，3），但是不是为第二个维度踩一个完整的3 * x.itemsize，它只向前迈出一个项目，有效地使新数组的行成为原始数组中的“移动窗口”视图。

(This also means that x.shape[0] * x.shape[1] will not be the same as x.size for your new array.)

（这也意味着x.shape [0] * x.shape [1]与新数组的x.size不同。）

At any rate, hopefully that makes things slightly clearer..

无论如何，希望这会让事情变得更加清晰......

#2

This solution is not efficiently implemented by a python loop since it comes with all kinds of type-checking best avoided when working with numpy arrays. If your array is exceptionally tall, you will notice a large speed up with this:

python循环没有有效地实现这个解决方案，因为它在使用numpy数组时最好避免各种类型检查。如果你的阵列非常高，你会注意到这个：

newshape = (4,3)
newstrides = (A.itemsize, A.itemsize)
B = numpy.lib.stride_tricks.as_strided(A, shape=newshape, strides=newstrides)

This gives a view of the array A. If you want a new array you can edit, do the same but with .copy() at the end.

这给出了数组A的视图。如果你想要一个可以编辑的新数组，那就做同样的事情，但最后使用.copy（）。

Details on strides:

有关进步的详细信息：

The newstrides tuple in this case will be (4,4) because the array has 4-byte items and you want to continue to step thru your data in single-item steps in the i-dimension. The second value '4' refers to the strides in the j-dimension (in a normal 4x4 array it would be 16). Because in this case you want to also also increment your read from the buffer in 4-byte steps in the j-dimension.

在这种情况下，新闻组元组将是（4,4），因为该数组具有4字节项，并且您希望继续在i维中以单项步骤逐步执行数据。第二个值'4'指的是j维中的步幅（在正常的4x4阵列中它将是16）。因为在这种情况下，您还希望在j维度中以4字节步长递增读取缓冲区。

Joe give a nice, detailed description and makes things crystal-clear when he says that all this trick does is change strides and shape simultaneously.

乔给出了一个很好的，详细的描述，当他说所有这些诀窍都是同时改变步幅和形状时，事情变得清晰。

#3

Which approach are you using?

你使用哪种方法？

import numpy as np
A = np.array([1,2,3,4,5,6,7,8,9,10])
width = 3

np.vstack([A[i:i-len(A)+width] for i in xrange(len(A)-width)])
# needs 26.3µs

np.vstack([A[i:i-width] for i in xrange(width)]).T
# needs 13.2µs

If your width is relatively low (3) and you have a big A (10000 elements), then the difference is even more important: 32.4ms for the first and 44µs for the second.

如果您的宽度相对较低（3）并且您有一个大A（10000个元素），则差异更为重要：第一个为32.4ms，第二个为44μs。

#4

Just to further go with the answer of @Joe general

只是为了进一步了解@Joe将军的答案

import numpy as np
def rolling(a, window):
    step = 2 
    shape = ( (a.size-window)/step + 1   , window)


    strides = (a.itemsize*step, a.itemsize)

    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(10)

print rolling(a, 3)

which outputs:

哪个输出：

[[0 1 2]
 [2 3 4]
 [4 5 6]
 [6 7 8]]

To generalize further for the 2d case,i.e use it for patch extraction from an image

对于2d情况进一步概括，即将其用于从图像中提取斑块

def rolling2d(a,win_h,win_w,step_h,step_w):

    h,w = a.shape
    shape = ( ((h-win_h)/step_h + 1)  * ((w-win_w)/step_w + 1) , win_h , win_w)

    strides = (step_w*a.itemsize, h*a.itemsize,a.itemsize)


    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(36).reshape(6,6)
print a
print rolling2d (a,3,3,2,2)

which outputs:

哪个输出：

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]]
[[[ 0  1  2]
  [ 6  7  8]
  [12 13 14]]

 [[ 2  3  4]
  [ 8  9 10]
  [14 15 16]]

 [[ 4  5  6]
  [10 11 12]
  [16 17 18]]

 [[ 6  7  8]
  [12 13 14]
  [18 19 20]]]

#5

I think this might be faster than looping, when the width is fixed at a low number...

我认为当宽度固定在较低的数字时，这可能比循环更快......

import numpy
a = numpy.array([1,2,3,4,5,6])
b = numpy.reshape(a, (numpy.shape(a)[0],1))
b = numpy.concatenate((b, numpy.roll(b,-1,0), numpy.roll(b,-2,0)), 1)
b = b[0:(numpy.shape(a)[0]/2) + 1,:]

EDIT Clearly, the solutions using strides are superior to this, with the only major disadvantage being that they are not yet well documented...

编辑显然，使用步幅的解决方案优于此，唯一的主要缺点是它们尚未得到很好的记录......

#6

Take a look at: view_as_windows.

看看：view_as_windows。

import numpy as np
from skimage.util.shape import view_as_windows
window_shape = (4, )
aa = np.arange(1000000000) # 1 billion
bb = view_as_windows(aa, window_shape)

Around 1 second.

大约1秒钟。

#7

I'm using a more generalized function similar to that of @JustInTime but applicable to ndarray

我正在使用类似于@JustInTime的更通用的函数，但适用于ndarray

def sliding_window(x, size, overlap=0):
    step = size - overlap # in npts
    nwin = (x.shape[-1]-size)//step + 1
    shape = x.shape[:-1] + (nwin, size)
    strides = x.strides[:-1] + (step*x.strides[-1], x.strides[-1])
    return stride_tricks.as_strided(x, shape=shape, strides=strides)

An example,

一个例子，

x = np.arange(10)
M.sliding_window(x, 5, 3)
Out[1]: 
array([[0, 1, 2, 3, 4],
       [2, 3, 4, 5, 6],
       [4, 5, 6, 7, 8]])


x = np.arange(10).reshape((2,5))
M.sliding_window(x, 3, 1)
Out[2]: 
array([[[0, 1, 2],
        [2, 3, 4]],

       [[5, 6, 7],
        [7, 8, 9]]])

#1

Actually, there's an even more efficient way to do this... The downside to using vstack etc, is that you're making a copy of the array.

实际上，有一个更有效的方法来做到这一点...使用vstack等的缺点是你正在制作数组的副本。

Incidentally, this is effectively identical to @Paul's answer, but I'm posting this just to explain things in a bit more detail...

顺便提一下，这与@Paul的答案实际上是一致的，但我发布的内容只是为了更详细地解释一下......

There's a way to do this with just views so that no memory is duplicated.

有一种方法可以只使用视图来执行此操作，以便不会重复内存。

I'm directly borrowing this from Erik Rigtorp's post to numpy-discussion, who in turn, borrowed it from Keith Goodman's Bottleneck (Which is quite useful!).

我直接从Erik Rigtorp的帖子中借用这个来讨论numpy讨论，后者又从Keith Goodman的瓶颈中借用它（这非常有用！）。

The basic trick is to directly manipulate the strides of the array (For one-dimensional arrays):

基本技巧是直接操纵数组的步幅（对于一维数组）：

import numpy as np

def rolling(a, window):
    shape = (a.size - window + 1, window)
    strides = (a.itemsize, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(10)
print rolling(a, 3)

Where a is your input array and window is the length of the window that you want (3, in your case).

其中a是您的输入数组，窗口是您想要的窗口长度（在您的情况下为3）。

This yields:

这会产生：

[[0 1 2]
 [1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]
 [5 6 7]
 [6 7 8]
 [7 8 9]]

However, there is absolutely no duplication of memory between the original a and the returned array. This means that it's fast and scales much better than other options.

但是，原始a和返回的数组之间绝对没有重复的内存。这意味着它比其他选项更快，更好地扩展。

For example (using a = np.arange(100000) and window=3):

例如（使用a = np.arange（100000）和window = 3）：

%timeit np.vstack([a[i:i-window] for i in xrange(window)]).T
1000 loops, best of 3: 256 us per loop

%timeit rolling(a, window)
100000 loops, best of 3: 12 us per loop

If we generalize this to a "rolling window" along the last axis for an N-dimensional array, we get Erik Rigtorp's "rolling window" function:

如果我们将这个概括为沿着N维数组的最后一个轴的“滚动窗口”，我们得到了Erik Rigtorp的“滚动窗口”功能：

import numpy as np

def rolling_window(a, window):
   """
   Make an ndarray with a rolling window of the last dimension

   Parameters
   ----------
   a : array_like
       Array to add rolling window to
   window : int
       Size of rolling window

   Returns
   -------
   Array that is a view of the original array with a added dimension
   of size w.

   Examples
   --------
   >>> x=np.arange(10).reshape((2,5))
   >>> rolling_window(x, 3)
   array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
          [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

   Calculate rolling mean of last dimension:
   >>> np.mean(rolling_window(x, 3), -1)
   array([[ 1.,  2.,  3.],
          [ 6.,  7.,  8.]])

   """
   if window < 1:
       raise ValueError, "`window` must be at least 1."
   if window > a.shape[-1]:
       raise ValueError, "`window` is too long."
   shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   strides = a.strides + (a.strides[-1],)
   return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

x = np.arange(9)
print x.strides

现在，如果我们将其重塑为2D，3x3数组，则步幅将是（3 * 8,8），因为我们必须跳过24个字节以沿第一个轴增加一步，并且8个字节增加一步第二轴。

y = x.reshape(3,3)
print y.strides

Similarly a transpose is the same as just reversing the strides of an array:

类似地，转置与仅反转数组的步幅相同：

print y
y.strides = y.strides[::-1]
print y

Therefore, if you want to change both the shape and size of an array simultaneously, you can't do it just by setting x.strides and x.shape, even if the new strides and shape are compatible.

因此，如果要同时更改数组的形状和大小，则只能通过设置x.strides和x.shape来实现，即使新的步幅和形状是兼容的。

That's where numpy.lib.as_strided comes in. It's actually a very simple function that just sets the strides and shape of an array simultaneously.

这就是numpy.lib.as_strided的用武之地。它实际上是一个非常简单的函数，它只是同时设置数组的步幅和形状。

So, all we've done is made it so that steps one item forward (8 bytes in the case of a 64-bit array) along one axis, but also only steps 8 bytes forward along the other axis.

因此，我们所做的就是沿着一个轴向前移动一个项目（在64位阵列的情况下为8个字节），但也只沿另一个轴向前移动8个字节。

(This also means that x.shape[0] * x.shape[1] will not be the same as x.size for your new array.)

（这也意味着x.shape [0] * x.shape [1]与新数组的x.size不同。）

At any rate, hopefully that makes things slightly clearer..

无论如何，希望这会让事情变得更加清晰......

#2

python循环没有有效地实现这个解决方案，因为它在使用numpy数组时最好避免各种类型检查。如果你的阵列非常高，你会注意到这个：

newshape = (4,3)
newstrides = (A.itemsize, A.itemsize)
B = numpy.lib.stride_tricks.as_strided(A, shape=newshape, strides=newstrides)

This gives a view of the array A. If you want a new array you can edit, do the same but with .copy() at the end.

这给出了数组A的视图。如果你想要一个可以编辑的新数组，那就做同样的事情，但最后使用.copy（）。

Details on strides:

有关进步的详细信息：

Joe give a nice, detailed description and makes things crystal-clear when he says that all this trick does is change strides and shape simultaneously.

乔给出了一个很好的，详细的描述，当他说所有这些诀窍都是同时改变步幅和形状时，事情变得清晰。

#3

Which approach are you using?

你使用哪种方法？

import numpy as np
A = np.array([1,2,3,4,5,6,7,8,9,10])
width = 3

np.vstack([A[i:i-len(A)+width] for i in xrange(len(A)-width)])
# needs 26.3µs

np.vstack([A[i:i-width] for i in xrange(width)]).T
# needs 13.2µs

If your width is relatively low (3) and you have a big A (10000 elements), then the difference is even more important: 32.4ms for the first and 44µs for the second.

如果您的宽度相对较低（3）并且您有一个大A（10000个元素），则差异更为重要：第一个为32.4ms，第二个为44μs。

#4

Just to further go with the answer of @Joe general

只是为了进一步了解@Joe将军的答案

import numpy as np
def rolling(a, window):
    step = 2 
    shape = ( (a.size-window)/step + 1   , window)


    strides = (a.itemsize*step, a.itemsize)

    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(10)

print rolling(a, 3)

which outputs:

哪个输出：

[[0 1 2]
 [2 3 4]
 [4 5 6]
 [6 7 8]]

To generalize further for the 2d case,i.e use it for patch extraction from an image

对于2d情况进一步概括，即将其用于从图像中提取斑块

def rolling2d(a,win_h,win_w,step_h,step_w):

    h,w = a.shape
    shape = ( ((h-win_h)/step_h + 1)  * ((w-win_w)/step_w + 1) , win_h , win_w)

    strides = (step_w*a.itemsize, h*a.itemsize,a.itemsize)


    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(36).reshape(6,6)
print a
print rolling2d (a,3,3,2,2)

which outputs:

哪个输出：

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]]
[[[ 0  1  2]
  [ 6  7  8]
  [12 13 14]]

 [[ 2  3  4]
  [ 8  9 10]
  [14 15 16]]

 [[ 4  5  6]
  [10 11 12]
  [16 17 18]]

 [[ 6  7  8]
  [12 13 14]
  [18 19 20]]]

#5

I think this might be faster than looping, when the width is fixed at a low number...

我认为当宽度固定在较低的数字时，这可能比循环更快......

import numpy
a = numpy.array([1,2,3,4,5,6])
b = numpy.reshape(a, (numpy.shape(a)[0],1))
b = numpy.concatenate((b, numpy.roll(b,-1,0), numpy.roll(b,-2,0)), 1)
b = b[0:(numpy.shape(a)[0]/2) + 1,:]

EDIT Clearly, the solutions using strides are superior to this, with the only major disadvantage being that they are not yet well documented...

编辑显然，使用步幅的解决方案优于此，唯一的主要缺点是它们尚未得到很好的记录......

#6

Take a look at: view_as_windows.

看看：view_as_windows。

import numpy as np
from skimage.util.shape import view_as_windows
window_shape = (4, )
aa = np.arange(1000000000) # 1 billion
bb = view_as_windows(aa, window_shape)

Around 1 second.

大约1秒钟。

#7

I'm using a more generalized function similar to that of @JustInTime but applicable to ndarray

我正在使用类似于@JustInTime的更通用的函数，但适用于ndarray

def sliding_window(x, size, overlap=0):
    step = size - overlap # in npts
    nwin = (x.shape[-1]-size)//step + 1
    shape = x.shape[:-1] + (nwin, size)
    strides = x.strides[:-1] + (step*x.strides[-1], x.strides[-1])
    return stride_tricks.as_strided(x, shape=shape, strides=strides)

An example,

一个例子，

x = np.arange(10)
M.sliding_window(x, 5, 3)
Out[1]: 
array([[0, 1, 2, 3, 4],
       [2, 3, 4, 5, 6],
       [4, 5, 6, 7, 8]])


x = np.arange(10).reshape((2,5))
M.sliding_window(x, 3, 1)
Out[2]: 
array([[[0, 1, 2],
        [2, 3, 4]],

       [[5, 6, 7],
        [7, 8, 9]]])

秒客网

1D阵列的高效Numpy 2D阵列构造

7 个解决方案

#1

#2

#3

#4

#5

#6

#7

#1

#2

#3

#4

#5

#6

#7

相关文章