Theano2.1.17-基础知识之剖析theano的函数

来自：http://deeplearning.net/software/theano/tutorial/profiling.html

Profiling Theano function

note：该方法是用来代替旧的ProfileMode.不要再使用ProfileMode了。

在检查错误的同时，另一个重要的任务就是剖析你的代码。你会用到theano 的flags 或者参数，然后将它们传递给 theano.function。

最简单的剖析theano函数的方式就是使用下面介绍的theano flags。当进程存在的时候，它们会将信息输出到stdout（标准输出流）。

使用 ProfileMode 是一个三步过程

开启这个分析器是很简单的，只需要用到 flag config.profile.

确保内存分析器用到theano 的flag: config.profile_memory 和 config.profile.

为了能够在theano优化的时候进行分析，使用theano 的flag: config.profile_optimizer 和 config.profile.

你可以使用theano 的flags profiling.n_apply、profiling.n_ops 和profiling.min_memory_size 来修改打印信息的数量。

分析器会对每个theano函数进行分析，而且会分析打印出的分析的总和。每个分析包含4个部分：全局信息，类信息，ops信息和apply节点信息。

在全局部分， “Message” 就是theano函数的名称， theano.function() 有一个可选的参数 name 而这个默认是为None。对name进行有意义的赋值有助于你分析许多theano函数
。在这个部分中，我们同样看到函数调用的次数和在所有这些调用上花费的总的时间。花费在Function.fn.__call__ 上和在块中的时间分析有助于理解theano的开销。

同样的，我们看到在编译过程的两个阶段上时间的花费：优化(修改graph使得能够更加的稳定/快速) 和链接(编译c 代码并可以被python调用).

类，ops和apply节点部分有着相同的信息：关于运行的apply节点的信息。ops部分会从apply部分得到有用的信息然后融合那些有着相同op的apply节点。如果在graph中两个apply节点有着两个相同的ops，那么它们会被融合。一些节点比如逐元素，如果它们的参数不同的话（被执行的标量），就不相等了。所以类部分会比ops部分融合更多的apply节点。

这里就是一个当我们禁用某些theano优化来更加直观清晰的理解不同的部分之间的差异的一个例子。在当所有的优化都启用的时候，那么在graph中就只剩下一个op了。

note：

为了剖析在gpu上内存使用的高峰情况，你需要：

* 在文件中 theano/sandbox/cuda/cuda_ndarray.cu, 设置宏

  COMPUTE_GPU_MEM_USED to 1.

* 然后调用 theano.sandbox.cuda.theano_allocated()

  它返回一个有着2个int值的元组.第一个值就是指示当前的gpu分配给theano的内存；第二个就是theano占用gpu内存使用的峰值。

不要总是开启这个宏，这会减慢内存分配和释放。而且还会减慢计算速度，所以会影响到速度分析。所以不要在这些情况下开启这个。

运行这个例子:

THEANO_FLAGS=optimizer_excluding=fusion:inplace,profile=True python doc/tutorial/profiling_example.py

输出：

Function profiling

==================

  Message: None

  Time in 1 calls to Function.__call__: 5.698204e-05s

  Time in Function.fn.__call__: 1.192093e-05s (20.921%)

  Time in thunks: 6.198883e-06s (10.879%)

  Total compile time: 3.642474e+00s

    Theano Optimizer time: 7.326508e-02s

       Theano validate time: 3.712177e-04s

    Theano Linker time (includes C, CUDA code generation/compiling): 9.584920e-01s

Class

---

<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>

  100.0%   100.0%       0.000s       2.07e-06s     C        3        3   <class 'theano.tensor.elemwise.Elemwise'>

   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops

---

<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>

  65.4%    65.4%       0.000s       2.03e-06s     C        2        2   Elemwise{add,no_inplace}

  34.6%   100.0%       0.000s       2.15e-06s     C        1        1   Elemwise{mul,no_inplace}

   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply

------

<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>

  50.0%    50.0%       0.000s       3.10e-06s      1     0   Elemwise{add,no_inplace}(x, y)

  34.6%    84.6%       0.000s       2.15e-06s      1     2   Elemwise{mul,no_inplace}(TensorConstant{(1,) of 2.0}, Elemwise{add,no_inplace}.0)

  15.4%   100.0%       0.000s       9.54e-07s      1     1   Elemwise{add,no_inplace}(Elemwise{add,no_inplace}.0, z)

   ... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)

参考资料：

[1]官网：http://deeplearning.net/software/theano/tutorial/profiling.html

秒客网

Theano2.1.17-基础知识之剖析theano的函数

相关文章