numpy:如何快速转换数组类型

时间:2021-08-12 21:27:42

I find the astype() method of numpy arrays not very efficient. I have an array containing 3 million of Uint8 point. Multiplying it by a 3x3 matrix takes 2 second, but converting the result from uint16 to uint8 takes another second.

我发现numpy数组的astype()方法不是很有效。我有一个包含300万个Uint8点的数组。乘以一个3x3矩阵需要2秒,但是把结果从uint16转换成uint8需要另一秒。

More precisely :

更准确地说应该是:

    print time.clock()
    imgarray = np.dot(imgarray,  M)/255
    print time.clock()
    imgarray = imgarray.clip(0, 255)
    print time.clock()
    imgarray = imgarray.astype('B')
    print time.clock()

dot product and scaling takes 2 sec
clipping takes 200 msec type conversion takes 1 sec

点积和缩放需要2秒的剪辑需要200秒的类型转换需要1秒

Given the time taken by the other operations, I would expect astype to be faster. Is there a faster way to do type conversion, or am I wrong when guesstimating that type conversion should not be that hard ?

考虑到其他操作所花费的时间,我预计astype会更快。是否有一种更快的方式来进行类型转换,或者当我猜测类型转换不应该那么困难时,我错了吗?

Edit : the goal is to save the final 8 bit array to a file

编辑:目标是将最后的8位数组保存到一个文件中

1 个解决方案

#1


24  

When you use imgarray = imgarray.astype('B'), you get a copy of the array, cast to the specified type. This requires extra memory allocation, even though you immediately flip imgarray to point to the newly allocated array.

当您使用imgarray = imgarray.astype('B')时,您将获得该数组的副本,并将其转换为指定的类型。这需要额外的内存分配,即使您立即翻转imgarray来指向新分配的数组。

If you use imgarray.view('uint8'), then you get a view of the array. This uses the same data except that it is interpreted as uint8 instead of imgarray.dtype. (np.dot returns a uint32 array, so after the np.dot, imgarray is of type uint32.)

如果使用imgarray.view('uint8'),则会获得数组的视图。它使用相同的数据,只是将其解释为uint8而不是imgarray.dtype。(np。点返回一个uint32数组,所以在np之后。点,imgarray是uint32型)

The problem with using view, however, is that a 32-bit integer becomes viewed as 4 8-bit integers, and we only care about the value in the last 8-bits. So we need to skip to every 4th 8-bit integer. We can do that with slicing:

然而,使用view的问题是,32位整数被视为4个8位整数,我们只关心最后8位的值。所以我们需要跳到每4个8位整数。我们可以用切片来做:

imgarray.view('uint8')[:,::4]

imgarray.view(uint8)(:,::4)

IPython's %timeit command shows there is a significant speed up doing things this way:

IPython的%timeit命令显示了以这种方式进行操作的显著速度:

In [37]: %timeit imgarray2 = imgarray.astype('B')
10000 loops, best of 3: 107 us per loop

In [39]: %timeit imgarray3 = imgarray.view('B')[:,::4]
100000 loops, best of 3: 3.64 us per loop

#1


24  

When you use imgarray = imgarray.astype('B'), you get a copy of the array, cast to the specified type. This requires extra memory allocation, even though you immediately flip imgarray to point to the newly allocated array.

当您使用imgarray = imgarray.astype('B')时,您将获得该数组的副本,并将其转换为指定的类型。这需要额外的内存分配,即使您立即翻转imgarray来指向新分配的数组。

If you use imgarray.view('uint8'), then you get a view of the array. This uses the same data except that it is interpreted as uint8 instead of imgarray.dtype. (np.dot returns a uint32 array, so after the np.dot, imgarray is of type uint32.)

如果使用imgarray.view('uint8'),则会获得数组的视图。它使用相同的数据,只是将其解释为uint8而不是imgarray.dtype。(np。点返回一个uint32数组,所以在np之后。点,imgarray是uint32型)

The problem with using view, however, is that a 32-bit integer becomes viewed as 4 8-bit integers, and we only care about the value in the last 8-bits. So we need to skip to every 4th 8-bit integer. We can do that with slicing:

然而,使用view的问题是,32位整数被视为4个8位整数,我们只关心最后8位的值。所以我们需要跳到每4个8位整数。我们可以用切片来做:

imgarray.view('uint8')[:,::4]

imgarray.view(uint8)(:,::4)

IPython's %timeit command shows there is a significant speed up doing things this way:

IPython的%timeit命令显示了以这种方式进行操作的显著速度:

In [37]: %timeit imgarray2 = imgarray.astype('B')
10000 loops, best of 3: 107 us per loop

In [39]: %timeit imgarray3 = imgarray.view('B')[:,::4]
100000 loops, best of 3: 3.64 us per loop