0或空的numpy数组

时间:2021-11-21 21:23:02

I am writing code and efficiency is very important. Actually I need 2d array, that I am filling with 0 and 1 in for loop. What is better and why?

我正在写代码,效率非常重要。实际上我需要二维数组,我用0和1填充for循环。什么更好,为什么更好?

  1. Make empty array and fill it with "0" and "1". It's pseudocode, my array will be much bigger.

    创建空数组,并用“0”和“1”填充。它是伪代码,我的数组会更大。

  2. Make array filled by zeros and make if() and if not zero - put one.

    使数组填充为0,并使if()和if不是0 - put 1。

So I need information what is more efficiency: 1. Put every element "0" and "1" to empty array or 2. Make if() (efficiency of 'if') and then put only "1" element.

所以我需要更有效率的信息:1。将每个元素“0”和“1”设为空数组或2。制作if() (if的效率),然后只放入“1”元素。

3 个解决方案

#1


2  

  • empty() does not initialize the memory, therefore your array will be filled with garbage and you will have to initialize all cells.
  • empty()没有初始化内存,因此数组将被填满垃圾,必须初始化所有单元格。
  • zeros() initializes everything to 0. Therefore, if your final result includes lots of zeros, this will save you the time to set all those array cells to zero manually.
  • 0()初始化为0。因此,如果最终结果包含大量的0,这将节省您手动将所有这些数组单元格设置为0的时间。

I would go with zeros(). The performance bottleneck will be your python for loop anyway.

我选择0 ()无论如何,性能瓶颈将是您的python for循环。

Fortunately, Numpy now as a JIT compiler, which can turn your crummy and slow python for loop into machine code:

幸运的是,Numpy现在作为一个JIT编译器,它可以将您的糟糕而缓慢的python for loop转换为机器码:

http://numba.pydata.org/

http://numba.pydata.org/

I tried it. It's a bit rough around the edges, but the speedups can be quite spectacular compared to bare python code. Of course the best choice is to vectorize using numpy, but you don't always have a choice.

我试着它。它的边缘有点粗糙,但是与裸python代码相比,它的速度非常惊人。当然,最好的选择是使用numpy进行矢量化,但您并不总是可以选择。

#2


1  

Ae = np.empty(10000)
A0 = np.zeros((10000)

differ slightly in how memory is initially allocated. But any differences in time will be minor if you go on and do something like

内存最初的分配方式略有不同。但是如果你继续做类似的事情,时间上的任何差异都是微不足道的

for i in range(10000):
    Ae[i] = <some calc>

or

for i in range(10000):
    val = <some calc>
    if val>0:
       A0[i] = val

If I had to loop like this, I'd go ahead and use np.zeros, and also use the unconditional assignment. It keeps the code simpler, and compared to everything else that is going on, the time differences will be minor.

如果我要这样循环,我就用np。零,也使用无条件赋值。它使代码更简单,并且与正在进行的所有其他操作相比,时间差异很小。


Sample times:

样品时间:

In [33]: def foo0(N):
    ...:     A = np.empty(N,int)
    ...:     for i in range(N):
    ...:         A[i] = np.random.randint(0,2)
    ...:     return A
    ...: 
In [34]: def foo1(N):
    ...:     A = np.zeros(N,int)
    ...:     for i in range(N):
    ...:         val = np.random.randint(0,2)
    ...:         if val:
    ...:             A[i] = val
    ...:     return A
    ...: 

3 ways of assigning 10 0/1 values

分配10 /1值的3种方法

In [35]: foo0(10)
Out[35]: array([0, 0, 1, 0, 0, 1, 0, 1, 1, 0])
In [36]: foo1(10)
Out[36]: array([0, 1, 1, 1, 1, 1, 1, 1, 0, 0])
In [37]: np.random.randint(0,2,10)
Out[37]: array([0, 1, 1, 0, 1, 1, 1, 0, 0, 1])

times:

时报》:

In [38]: timeit foo0(1000)
100 loops, best of 3: 4.06 ms per loop
In [39]: timeit foo1(1000)
100 loops, best of 3: 3.95 ms per loop
In [40]: timeit np.random.randint(0,2,1000)
... cached.
100000 loops, best of 3: 13.6 µs per loop

The 2 loop times are nearly the same.

两个循环的时间几乎相同。

#3


1  

It is better to create array of zeros and fill it using if-else. Even conditions makes slow your code, reshaping empty array or concatenating it with new vectors each iteration of loop is more slower operation, because each time new array of new size is created and old array is copied there together with new vector value by value.

最好创建0数组并使用if-else填充它。即使是条件也会使代码变慢,重新构造空数组或用新向量将其连接起来,循环的每次迭代操作都更慢,因为每次创建新大小的新数组,并将旧数组与新的向量值按值复制在一起。

#1


2  

  • empty() does not initialize the memory, therefore your array will be filled with garbage and you will have to initialize all cells.
  • empty()没有初始化内存,因此数组将被填满垃圾,必须初始化所有单元格。
  • zeros() initializes everything to 0. Therefore, if your final result includes lots of zeros, this will save you the time to set all those array cells to zero manually.
  • 0()初始化为0。因此,如果最终结果包含大量的0,这将节省您手动将所有这些数组单元格设置为0的时间。

I would go with zeros(). The performance bottleneck will be your python for loop anyway.

我选择0 ()无论如何,性能瓶颈将是您的python for循环。

Fortunately, Numpy now as a JIT compiler, which can turn your crummy and slow python for loop into machine code:

幸运的是,Numpy现在作为一个JIT编译器,它可以将您的糟糕而缓慢的python for loop转换为机器码:

http://numba.pydata.org/

http://numba.pydata.org/

I tried it. It's a bit rough around the edges, but the speedups can be quite spectacular compared to bare python code. Of course the best choice is to vectorize using numpy, but you don't always have a choice.

我试着它。它的边缘有点粗糙,但是与裸python代码相比,它的速度非常惊人。当然,最好的选择是使用numpy进行矢量化,但您并不总是可以选择。

#2


1  

Ae = np.empty(10000)
A0 = np.zeros((10000)

differ slightly in how memory is initially allocated. But any differences in time will be minor if you go on and do something like

内存最初的分配方式略有不同。但是如果你继续做类似的事情,时间上的任何差异都是微不足道的

for i in range(10000):
    Ae[i] = <some calc>

or

for i in range(10000):
    val = <some calc>
    if val>0:
       A0[i] = val

If I had to loop like this, I'd go ahead and use np.zeros, and also use the unconditional assignment. It keeps the code simpler, and compared to everything else that is going on, the time differences will be minor.

如果我要这样循环,我就用np。零,也使用无条件赋值。它使代码更简单,并且与正在进行的所有其他操作相比,时间差异很小。


Sample times:

样品时间:

In [33]: def foo0(N):
    ...:     A = np.empty(N,int)
    ...:     for i in range(N):
    ...:         A[i] = np.random.randint(0,2)
    ...:     return A
    ...: 
In [34]: def foo1(N):
    ...:     A = np.zeros(N,int)
    ...:     for i in range(N):
    ...:         val = np.random.randint(0,2)
    ...:         if val:
    ...:             A[i] = val
    ...:     return A
    ...: 

3 ways of assigning 10 0/1 values

分配10 /1值的3种方法

In [35]: foo0(10)
Out[35]: array([0, 0, 1, 0, 0, 1, 0, 1, 1, 0])
In [36]: foo1(10)
Out[36]: array([0, 1, 1, 1, 1, 1, 1, 1, 0, 0])
In [37]: np.random.randint(0,2,10)
Out[37]: array([0, 1, 1, 0, 1, 1, 1, 0, 0, 1])

times:

时报》:

In [38]: timeit foo0(1000)
100 loops, best of 3: 4.06 ms per loop
In [39]: timeit foo1(1000)
100 loops, best of 3: 3.95 ms per loop
In [40]: timeit np.random.randint(0,2,1000)
... cached.
100000 loops, best of 3: 13.6 µs per loop

The 2 loop times are nearly the same.

两个循环的时间几乎相同。

#3


1  

It is better to create array of zeros and fill it using if-else. Even conditions makes slow your code, reshaping empty array or concatenating it with new vectors each iteration of loop is more slower operation, because each time new array of new size is created and old array is copied there together with new vector value by value.

最好创建0数组并使用if-else填充它。即使是条件也会使代码变慢,重新构造空数组或用新向量将其连接起来,循环的每次迭代操作都更慢,因为每次创建新大小的新数组,并将旧数组与新的向量值按值复制在一起。