如何加速python嵌套循环?

I'm performing a nested loop in python that is included below. This serves as a basic way of searching through existing financial time series and looking for periods in the time series that match certain characteristics. In this case there are two separate, equally sized, arrays representing the 'close' (i.e. the price of an asset) and the 'volume' (i.e. the amount of the asset that was exchanged over the period). For each period in time I would like to look forward at all future intervals with lengths between 1 and INTERVAL_LENGTH and see if any of those intervals have characteristics that match my search (in this case the ratio of the close values is greater than 1.0001 and less than 1.5 and the summed volume is greater than 100).

我正在python中执行一个嵌套循环，它包含在下面。这是搜索现有金融时间序列和寻找符合某些特征的时间序列中的周期的基本方法。在这种情况下，有两个独立的、同样大小的数组，表示“close”(即资产的价格)和“volume”(即在此期间交换的资产的数量)。每一段时间我想期待在未来所有的间隔长度介于1和INTERVAL_LENGTH,看看这些间隔特征匹配我的搜索(在本例中密切值的比值大于1.0001小于1.5和总结体积大于100)。

My understanding is that one of the major reasons for the speedup when using NumPy is that the interpreter doesn't need to type-check the operands each time it evaluates something so long as you're operating on the array as a whole (e.g. numpy_array * 2), but obviously the code below is not taking advantage of that. Is there a way to replace the internal loop with some kind of window function which could result in a speedup, or any other way using numpy/scipy to speed this up substantially in native python?

我的理解是,加速的一个主要原因使用NumPy是翻译时不需要在每次操作数类型检查的东西只要你在整个数组操作(如numpy_array * 2),但显然下面的代码不是利用。是否有一种方法可以将内部循环替换为某种窗口函数，从而导致加速，或者使用numpy/scipy来加速本机python的速度?

Alternatively, is there a better way to do this in general (e.g. will it be much faster to write this loop in C++ and use weave)?

或者，是否有更好的方法来实现这一目标(例如，在c++中编写这个循环并使用weave会快得多)?

ARRAY_LENGTH = 500000
INTERVAL_LENGTH = 15
close = np.array( xrange(ARRAY_LENGTH) )
volume = np.array( xrange(ARRAY_LENGTH) )
close, volume = close.astype('float64'), volume.astype('float64')

results = []
for i in xrange(len(close) - INTERVAL_LENGTH):
    for j in xrange(i+1, i+INTERVAL_LENGTH):
        ret = close[j] / close[i]
        vol = sum( volume[i+1:j+1] )
        if ret > 1.0001 and ret < 1.5 and vol > 100:
            results.append( [i, j, ret, vol] )
print results

3 个解决方案

#1

Update: (almost) completely vectorized version below in "new_function2"...

更新:(几乎)在“new_function2”中完全向量化的版本。

I'll add comments to explain things in a bit.

我将添加注释来稍微解释一下。

It gives a ~50x speedup, and a larger speedup is possible if you're okay with the output being numpy arrays instead of lists. As is:

如果输出是numpy数组而不是列表，那么它将提供一个大约50x的加速比，并且如果您对输出是numpy数组而不是列表没有问题的话，可能会有更大的加速比。就像:

In [86]: %timeit new_function2(close, volume, INTERVAL_LENGTH)
1 loops, best of 3: 1.15 s per loop

You can replace your inner loop with a call to np.cumsum()... See my "new_function" function below. This gives a considerable speedup...

您可以通过调用np.cumsum()来替换内部循环。参见下面的“new_function”函数。这会带来相当大的加速……

In [61]: %timeit new_function(close, volume, INTERVAL_LENGTH)
1 loops, best of 3: 15.7 s per loop

In [62]: %timeit old_function(close, volume, INTERVAL_LENGTH)
1 loops, best of 3: 53.1 s per loop

It should be possible to vectorize the entire thing and avoid for loops entirely, though... Give me an minute, and I'll see what I can do...

不过，应该可以对整个东西进行矢量化，并完全避免循环……给我一分钟，我看看我能做什么……

import numpy as np

ARRAY_LENGTH = 500000
INTERVAL_LENGTH = 15
close = np.arange(ARRAY_LENGTH, dtype=np.float)
volume = np.arange(ARRAY_LENGTH, dtype=np.float)

def old_function(close, volume, INTERVAL_LENGTH):
    results = []
    for i in xrange(len(close) - INTERVAL_LENGTH):
        for j in xrange(i+1, i+INTERVAL_LENGTH):
            ret = close[j] / close[i]
            vol = sum( volume[i+1:j+1] )
            if (ret > 1.0001) and (ret < 1.5) and (vol > 100):
                results.append( (i, j, ret, vol) )
    return results


def new_function(close, volume, INTERVAL_LENGTH):
    results = []
    for i in xrange(close.size - INTERVAL_LENGTH):
        vol = volume[i+1:i+INTERVAL_LENGTH].cumsum()
        ret = close[i+1:i+INTERVAL_LENGTH] / close[i]

        filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100)
        j = np.arange(i+1, i+INTERVAL_LENGTH)[filter]

        tmp_results = zip(j.size * [i], j, ret[filter], vol[filter])
        results.extend(tmp_results)
    return results

def new_function2(close, volume, INTERVAL_LENGTH):
    vol, ret = [], []
    I, J = [], []
    for k in xrange(1, INTERVAL_LENGTH):
        start = k
        end = volume.size - INTERVAL_LENGTH + k
        vol.append(volume[start:end])
        ret.append(close[start:end])
        J.append(np.arange(start, end))
        I.append(np.arange(volume.size - INTERVAL_LENGTH))

    vol = np.vstack(vol)
    ret = np.vstack(ret)
    J = np.vstack(J)
    I = np.vstack(I)

    vol = vol.cumsum(axis=0)
    ret = ret / close[:-INTERVAL_LENGTH]

    filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100)

    vol = vol[filter]
    ret = ret[filter]
    I = I[filter]
    J = J[filter]

    output = zip(I.flat,J.flat,ret.flat,vol.flat)
    return output

results = old_function(close, volume, INTERVAL_LENGTH)
results2 = new_function(close, volume, INTERVAL_LENGTH)
results3 = new_function(close, volume, INTERVAL_LENGTH)

# Using sets to compare, as the output 
# is in a different order than the original function
print set(results) == set(results2)
print set(results) == set(results3)

#2

One speedup would be to remove the sum portion, as in this implementation it sums a list of length 2 through INTERVAL_LENGTH. Instead, just add volume[j+1] to the previous result of vol from the last iteration of the loop. Thus, you're just adding two integers each time instead of summing an entire list AND slicing it each time. Also, instead of starting by doing sum(volume[i+1:j+1]), just do vol = volume[i+1] + volume[j+1], as you know the initial case here will always be just two indices.

一个加速将是删除sum部分，就像在这个实现中一样，它通过INTERVAL_LENGTH来总结长度为2的列表。相反，只需将卷[j+1]添加到循环最后一次迭代的vol的前一个结果中。因此，每次只添加两个整数，而不是将整个列表相加，然后每次分割。同样，不要一开始就做sum(volume[i+1:j+1])，而是做vol = volume[i+1] + volume[j+1]，因为这里的初始情况总是只有两个指标。

Another speedup would be to use .extend instead of .append, as the python implementation has extend running significantly faster.

另一个加速将是使用.extend而不是.append，因为python实现的扩展速度明显快得多。

You could also break up the final if statement so as to only do certain computation if required. For instance, you know if vol <= 100, you don't need to compute ret.

您还可以将最后的if语句拆分，以便只在需要时进行某些计算。例如，如果vol <= 100，则不需要计算ret。

This doesn't answer your problem exactly, but I think especially with the sum issue that you should see significant speedups with these changes.

这并不能确切地回答你的问题，但我认为尤其是在求和问题上，你应该看到这些变化带来的显著加速。

Edit - you also don't need len, since you know specifically the length of the list already (unless that was just for the example). Defining it as a number rather than len(something) is always faster.

编辑——你也不需要len，因为你已经知道列表的长度(除非那只是一个例子)。将它定义为数字而不是len(某物)总是更快。

Edit - implementation (this is untested):

编辑-实现(未经测试):

ARRAY_LENGTH = 500000
INTERVAL_LENGTH = 15
close = np.array( xrange(ARRAY_LENGTH) )
volume = np.array( xrange(ARRAY_LENGTH) )
close, volume = close.astype('float64'), volume.astype('float64')

results = []
ex = results.extend
for i in xrange(ARRAY_LENGTH - INTERVAL_LENGTH):
    vol = volume[i+1]
    for j in xrange(i+1, i+INTERVAL_LENGTH):
        vol += volume[j+1]
        if vol > 100:
            ret = close[j] / close[i]
            if 1.0001 < ret < 1.5:
                ex( [i, j, ret, vol] )
print results

#3

Why don't you try to generate the result as a single list (much faster than appending or extending), something like:

为什么不尝试将结果作为单个列表生成(比附加或扩展快得多)，比如:

results = [ t for t in ( (i, j, close[j]/close[i], sum(volume[i+1:j+1]))
                         for i in xrange(len(close)-INT_LEN)
                             for j in xrange(i+1, i+INT_LEN)
                       )
            if t[3] > 100 and 1.0001 < t[2] < 1.5
          ]

#1