在一维numpy数组中计算局部均值

I have 1D NumPy array as follows:

我有一个一维NumPy数组，如下所示:

import numpy as np
d = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])

I want to calculate means of (1,2,6,7), (3,4,8,9), and so on. This involves mean of 4 elements: Two consecutive elements and two consecutive elements 5 positions after.

我想计算(1,2,6,7)(3,4,8,9)的均值，以此类推。这包括4个元素的平均值:两个连续的元素和两个连续的元素之后的5个位置。

I tried the following:

我试着以下:

>> import scipy.ndimage.filters as filt
>> res = filt.uniform_filter(d,size=4)
>> print res
[ 1  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

This unfortunately does not give me the desired results. How can I do it?

不幸的是，这并没有给我理想的结果。我怎么做呢?

2 个解决方案

#1

Instead of indexing, you can approach this with a signal processing perspective. You are basically performing a discrete convolution of your input signal with a 7-tap kernel where the three centre coefficients are 0 while the extremities are 1, and since you want to compute the average, you need to multiply all of the values by (1/4). However, you're not computing the convolution of all of the elements but we will address that later. One way is to use scipy.ndimage.filters.convolve1d for that:

与索引不同，您可以使用信号处理透视图来实现这一点。你基本上是在用一个7-tap的核函数来执行输入信号的离散卷积，这里的三个中心系数是0，而极值是1，既然你想计算平均值，你需要把所有的值乘以(1/4)。但是，你不是在计算所有元素的卷积但是我们稍后会解决这个问题。一种方法是使用scipy.ndimage.filter。convolve1d:

import numpy as np
from scipy.ndimage import filters
d = np.arange(1, 21, dtype=np.float)
ker = (1.0/4.0)*np.array([1,1,0,0,0,1,1], dtype=np.float)
out = filters.convolve1d(d, ker)[3:-3:2]

Because you're using a 7 tap kernel, convolution will extend the output by 3 to the left and 3 to the right, so you need to make sure to crop out the first and last three elements. You also want to skip every other element because convolution involves a sliding window, but you want to discard every other element so that you get the result you want.

因为你正在使用一个7 tap核，卷积将输出向左扩展3，向右扩展3，所以你需要确保裁剪出第一个和最后三个元素。你也想跳过所有其他的元素因为卷积包含一个滑动窗口，但是你想放弃所有其他的元素，这样你就能得到你想要的结果。

We get this for out:

我们得到的结果是:

In [47]: out
Out[47]: array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])

To double-check to see if we have the right result, try some sample calculations for each element. The first element is equal to (1+2+6+7)/4 = 4. The second element is equal to (3+4+8+9)/4 = 6, and so on.

要仔细检查是否有正确的结果，可以尝试每个元素的一些示例计算。第一个元素等于(1+2+6+7)/4 = 4。第二个元素等于(3+4+8+9)/4 = 6，依此类推。

For a solution with less headaches, try numpy.convolve with the mode=valid flag. This avoids the cutting out of the extra padding to the left and right, but you will still need to skip every other element though:

想要一个头痛不那么严重的解决方案，试试numpy。与模式的卷积=有效标志。这就避免了在左边和右边删除额外的填充，但是您仍然需要跳过所有其他元素:

import numpy as np
d = np.arange(1, 21, dtype=np.float)
ker = (1.0/4.0)*np.array([1,1,0,0,0,1,1], dtype=np.float)
out = np.convolve(d, ker, mode='valid')[::2]

We also get:

我们也有:

In [59]: out
Out[59]: array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])

Finally if you want indexing, something like this may suffice:

最后，如果你想要索引，这样的东西就足够了:

length = len(d[6::2])
out = np.array([(a+b+c+e)/4.0 for (a,b,c,e) in zip(d[::2][:length], d[1::2][:length], d[5::2][:length], d[6::2])])

We get:

我们得到:

In [69]: out
Out[69]: array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])

This is really ugly, but it works. The total length of your signal is governed by the fact that the end of each window is at the 7th index. The length of this array that contains these indices dictates the final length of your signal. Also, note that for an element in a window, its next element can found by skipping every other element until the end of the array. There are 4 of these sequences in total and we simply zip over these 4 sequences where each sequence skips every other element, but there is an offset that we start at. The first sequence starts at offset 0, the next at 1, the next at 5 and the next at 6. We collect these four elements and average them, then skip over every one in the array until we finish.

这真的很丑，但它确实有效。信号的总长度取决于每个窗口的末尾在第7个索引处。包含这些指标的数组的长度决定了信号的最终长度。另外，请注意，对于窗口中的一个元素，它的下一个元素可以通过跳过所有其他元素直到数组的末尾找到。总共有4个这样的序列，我们简单地将这4个序列压缩，每个序列跳过所有其他元素，但是有一个偏移量。第一个序列从偏移量0开始，下一个在1处，下一个在5处，下一个在6处。我们收集这四个元素并对它们进行平均，然后跳过数组中的每个元素，直到完成。

BTW, I still like convolution better.

顺便说一下，我还是比较喜欢卷积的。

#2

You can use numpy.lib.stride_tricks.as_strided() to obtain a grouping array applicable for a more generic case:

您可以使用numpy.lib. stride_老爸.as_strided()获取适用于更一般情况的分组数组:

import numpy as np
from numpy.lib.stride_tricks import as_strided

d = np.arange(1, 21)

consec = 2
offset = 5
nsub = 2
pace = 2

s = d.strides[0]
ngroups= (d.shape[0] - (consec + (nsub-1)*offset - 1))//pace
a = as_strided(d, shape=(ngroups, nsub, consec),
               strides=(pace*s, offset*s, 1*s))

Where:

地点:

consec is the number of consecutive numbers in the sub-group
consec是子组中连续数的个数
offset the offset between the first entry in each sub-group
抵消每个子组中第一个条目之间的偏移量
nsub the number of sub-groups (1, 2 is one sub-group, separated from the second sub-group 6, 7 by offset
子群的个数(1,2是一个子群，从第6个子组中分离出来)。
pace indicates the stride between the first entry of two groups, which in your case is pace=consec, but could be different in a more general case
pace表示两组第一次进入时的步幅，在您的例子中是pace=consec，但在更一般的情况下可能会有所不同

In your case (using the given values) a would be:

在你的情况下(使用给定的值)a将是:

array([[[ 1,  2],
        [ 6,  7]],

       [[ 3,  4],
        [ 8,  9]],

       [[ 5,  6],
        [10, 11]],

       [[ 7,  8],
        [12, 13]],

       [[ 9, 10],
        [14, 15]],

       [[11, 12],
        [16, 17]],

       [[13, 14],
        [18, 19]]])

From where it is quite ready to obtain the desired average by doing:

在以下情况下，它完全可以通过以下方式获得所需的平均数:

a.mean(axis=-1).mean(axis=-1)

#array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])

#1

import numpy as np
from scipy.ndimage import filters
d = np.arange(1, 21, dtype=np.float)
ker = (1.0/4.0)*np.array([1,1,0,0,0,1,1], dtype=np.float)
out = filters.convolve1d(d, ker)[3:-3:2]