具有非常大的numpy阵列的效率

I'm working with some very large arrays. An issue that I'm dealing with of course is running out of RAM to work with, but even before that my code is running slowly so that, even if I had infinite RAM, it would still take way too long. I'll give a bit of my code to show what I'm trying to do:

我正在使用一些非常大的数组。我正在处理的一个问题当然是RAM耗尽，但在此之前我的代码运行缓慢，所以即使我有无限的RAM，它仍然需要太长时间。我将给出一些代码来展示我正在尝试做的事情：

#samplez is a 3 million element 1-D array
#zfit is a 10,000 x 500 2-D array

b = np.arange((len(zfit))

for x in samplez:
    a = x-zfit
    mask = np.ma.masked_array(a)
    mask[a <= 0] = np.ma.masked
    index = mask.argmin(axis=1)
    #  These past 4 lines give me an index array of the smallest positive number 
    #  in x - zift       

    d = zfit[b,index]
    e = zfit[b,index+1]
    f = (x-d)/(e-d)
    # f is the calculation I am after

    if x == samplez[0]:
       g = f
       index_stack = index
    else:
       g = np.vstack((g,f))
       index_stack = np.vstack((index_stack,index))

I need to use g and index_stack, each of which are 3million x 10,000 2-D arrays, in a further calculation. Each iteration of this loop takes almost 1 second, so 3 million seconds total, which is way too long.

在进一步的计算中，我需要使用g和index_stack，每个都是3百万x 10,000个2-D阵列。这个循环的每次迭代花费将近1秒，总共300万秒，这太长了。

Is there anything I can do so that this calculation will run much faster? I've tried to think how I can do without this for loop, but the only way I can imagine is making 3 million copies of zfit, which is unfeasible.

我有什么办法可以让这个计算运行得更快吗？我试着想想如果没有这个for循环我怎么办，但我能想象的唯一方法就是制作300万份zfit，这是不可行的。

And is there someway I can work with these arrays by not keeping everything in RAM? I'm a beginner and everything I've searched about this is either irrelevant or something I can't understand. Thanks in advance.

有没有我可以通过不将所有内容保存在RAM中来处理这些数组？我是初学者，我搜索过的所有内容都是无关紧要的，或者是我无法理解的。提前致谢。

1 个解决方案

#1

It is good to know that the smallest positive number will never show up in the end of rows.

很高兴知道最小的正数永远不会出现在行的末尾。

In samplez there are 1 million unique values but in zfit, each row can only have 500 unique values at most. The entire zfit can have as much as 50 million unique values. The algorithm can be greatly sped up, if the number of 'finding the smallest positive number > each_element_in_samplez' calculation can be greatly reduced. Doing all 5e13 comparisons are probably an overkill and careful planing will be able to get rid of a large proportion of it. That will heavy depend on your actual underlying mathematics.

在samplez中有100万个唯一值，但在zfit中，每行最多只能有500个唯一值。整个zfit可以有多达5000万个唯一值。如果可以大大减少“找到最小正数> each_element_in_samplez”计算的数量，则可以大大加快算法。进行所有5e13比较可能是一种矫枉过正，小心翼翼的计划将能够摆脱它的很大一部分。这很大程度上取决于你的实际基础数学。

Without knowing it, there are still some small things can be done. 1, there are not so many of possible (e-d) so that can be taken out of the loop. 2, The loop can be eliminated by map. These two small fix, on my machine, result in about 22% speed-up.

在不知情的情况下，仍然可以做一些小事。在图1中，没有那么多可能的（e-d）因此可以从循环中取出。 2，循环可以通过map消除。在我的机器上，这两个小修复导致大约22％的加速。

def function_map(samplez, zfit):
    diff=zfit[:,:-1]-zfit[:,1:]
    def _fuc1(x):
        a = x-zfit
        mask = np.ma.masked_array(a)
        mask[a <= 0] = np.ma.masked
        index = mask.argmin(axis=1)
        d = zfit[:,index]
        f = (x-d)/diff[:,index] #constrain: smallest value never at the very end.
        return (index, f)
    result=map(_fuc1, samplez)
    return (np.array([item[1] for item in result]),
           np.array([item[0] for item in result]))

Next: masked_array can be avoided completely (which should bring significant improvement). samplez needs to be sorted as well.

下一步：完全可以避免使用masked_array（这应该会带来显着的改进）。 samplez也需要进行排序。

>>> x1=arange(50)
>>> x2=random.random(size=(20, 10))*120
>>> x2=sort(x2, axis=1) #just to make sure the last elements of each col > largest val in x1
>>> x3=x2*1
>>> f1=lambda: function_map2(x1,x3)
>>> f0=lambda: function_map(x1, x2)
>>> def function_map2(samplez, zfit):
    _diff=diff(zfit, axis=1)
    _zfit=zfit*1
    def _fuc1(x):
        _zfit[_zfit<x]=(+inf)
        index = nanargmin(zfit, axis=1)
        d = zfit[:,index]
        f = (x-d)/_diff[:,index] #constrain: smallest value never at the very end.
        return (index, f)
    result=map(_fuc1, samplez)
    return (np.array([item[1] for item in result]),
           np.array([item[0] for item in result]))

>>> import timeit
>>> t1=timeit.Timer('f1()', 'from __main__ import f1')
>>> t0=timeit.Timer('f0()', 'from __main__ import f0')
>>> t0.timeit(5)
0.09083795547485352
>>> t1.timeit(5)
0.05301499366760254
>>> t0.timeit(50)
0.8838210105895996
>>> t1.timeit(50)
0.5063929557800293
>>> t0.timeit(500)
8.900799036026001
>>> t1.timeit(500)
4.614129018783569

So, that is another 50% speed-up.

所以，这是另一个50％的加速。

masked_array is avoided and that saves some RAM. Can't think of anything else to reduce RAM usage. It may be necessary to process samplez in parts. And also, dependents on the data and the required precision, if you can use float16 or float32 instead of the default float64 that can save you a lot of RAM.

避免使用masked_array并节省一些RAM。想不出别的什么可以减少RAM的使用。可能需要部分处理samplez。而且，如果你可以使用float16或float32而不是默认的float64来节省大量的RAM，那么，依赖于数据和所需的精度。

#1