I have read in a number of of different places that numpy.take
is a much faster alternative to fancy indexing, for example here and here.
我已经在很多不同的地方读过numpy.take是一种比花式索引更快的替代方法,例如这里和这里。
However, I am not finding this to be the case... at all. Here is an example from when I was poking around my code during some debugging:
但是,我根本没有发现这种情况......这是一个例子,当我在一些调试过程中探索我的代码时:
knn_idx
Out[2]:
array([ 3290, 5847, 7682, 6957, 22660, 5482, 22661, 10965, 7,
1477, 7681, 3, 17541, 15717, 9139, 1475, 14251, 4400,
7680, 9140, 4758, 22289, 7679, 8407, 20101, 15718, 15716,
8405, 15710, 20829, 22662], dtype=uint32)
%timeit X.take(knn_idx, axis=0)
100 loops, best of 3: 3.14 ms per loop
%timeit X[knn_idx]
The slowest run took 60.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.48 µs per loop
X.shape
Out[5]:
(23011, 30)
X.dtype
Out[6]:
dtype('float64')
Which is showing that fancy indexing is much faster! Using numpy.arange
to generate the indices I get a similar result:
这表明花哨的索引要快得多!使用numpy.arange生成索引我得到了类似的结果:
idx = np.arange(0, len(X), 100)
%timeit X.take(idx, axis=0)
100 loops, best of 3: 3.04 ms per loop
%timeit X[idx]
The slowest run took 9.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 20.7 µs per loop
Why is fancy indexing so much faster than using numpy.take
now? Am I hitting some kind of edge case?
为什么花哨的索引比现在使用numpy.take快得多?我打的是某种边缘情况吗?
I'm using Python 3.6 through Anaconda and here is my numpy info if relevant:
我正在使用Python 3.6通过Anaconda,如果相关,这是我的numpy信息:
np.__version__
Out[11]:
'1.11.3'
np.__config__.show()
blas_mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
blas_opt_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
lapack_opt_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
1 个解决方案
#1
1
In my tests take
is modestly faster; but with the small time and 'cached' warning I don't put a lot of stock in the difference:
在我的测试中,采取适度更快;但是由于时间和“缓存”警告,我不会在差异上投入大量资金:
In [192]: timeit X.take(idx2, axis=0).shape
The slowest run took 23.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.66 µs per loop
In [193]: timeit X[idx2,:].shape
The slowest run took 16.75 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.58 µs per loop
But your index array is uint32
. That worked ok with indexing, but take gave me a casting error; So my idx2
is astype(int)
.
但你的索引数组是uint32。这对索引起作用了,但是拿了一个转换错误;所以我的idx2是astype(int)。
And with the arange idx, times are 11.5 µs, 16 µs.
使用arange idx,时间为11.5μs,16μs。
Notice I'm timing with a .shape
; I'm not entirely sure that makes a difference.
请注意我的时间是.shape;我不完全确定这会有所作为。
I don't know why you are getting ms
times for your take. It feels like more of a timing issue than an actual difference in take
.
我不知道你为什么要花费ms时间。感觉更多的是时间问题,而不是实际的差异。
I don't the libraries, BLAS etc will make a difference. The underlying task is basically the same - step through the data buffer and copy out selected bytes. There's no complicated computation to farm out. But I haven't studied the C code for take.
我不是图书馆,BLAS等会有所作为。底层任务基本相同 - 逐步通过数据缓冲区并复制出选定的字节。农场没有复杂的计算。但我还没有研究过C代码。
Numpy version '1.12.0', Linux, 4gb refurbished desktop.
Numpy版本'1.12.0',Linux,4gb翻新桌面。
#1
1
In my tests take
is modestly faster; but with the small time and 'cached' warning I don't put a lot of stock in the difference:
在我的测试中,采取适度更快;但是由于时间和“缓存”警告,我不会在差异上投入大量资金:
In [192]: timeit X.take(idx2, axis=0).shape
The slowest run took 23.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.66 µs per loop
In [193]: timeit X[idx2,:].shape
The slowest run took 16.75 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.58 µs per loop
But your index array is uint32
. That worked ok with indexing, but take gave me a casting error; So my idx2
is astype(int)
.
但你的索引数组是uint32。这对索引起作用了,但是拿了一个转换错误;所以我的idx2是astype(int)。
And with the arange idx, times are 11.5 µs, 16 µs.
使用arange idx,时间为11.5μs,16μs。
Notice I'm timing with a .shape
; I'm not entirely sure that makes a difference.
请注意我的时间是.shape;我不完全确定这会有所作为。
I don't know why you are getting ms
times for your take. It feels like more of a timing issue than an actual difference in take
.
我不知道你为什么要花费ms时间。感觉更多的是时间问题,而不是实际的差异。
I don't the libraries, BLAS etc will make a difference. The underlying task is basically the same - step through the data buffer and copy out selected bytes. There's no complicated computation to farm out. But I haven't studied the C code for take.
我不是图书馆,BLAS等会有所作为。底层任务基本相同 - 逐步通过数据缓冲区并复制出选定的字节。农场没有复杂的计算。但我还没有研究过C代码。
Numpy version '1.12.0', Linux, 4gb refurbished desktop.
Numpy版本'1.12.0',Linux,4gb翻新桌面。