参考文献
http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/ug_docs/index.htm
amplxe-cl -collect hotspots -- ./driver /home/zxx/work_autumn_2011/matrices/rma10.mtx
Reading sparse matrix from file (/home/zxx/work_autumn_2011/matrices/rma10.mtx): done
Using 46835-by-46835 matrix with 2374001 nonzero values
------------------------------------------
#### Testing COO Kernels ####
creating coo_matrix:coo transform time elapsed 0.013690
do coo spmv time elapsed 5.434732 seconds
orignal do coo spmv time elapsed 5.429192 seconds
Using result path `/home/zxx/work_autumn_2011/all_format/r001hs'
Executing actions 75 % Generating a report
Summary
-------
Elapsed Time: 11.312
CPU Time: 11.280
Executing actions 100 % done
amplxe-cl -report hotspots -result-dir r001hs
Using result path `/home/zxx/work_autumn_2011/all_format/r001hs'
Executing actions 75 % Generating a reportFunction Module CPU Time
__spmv_coo_serial_host_sse driver 5.420
__spmv_coo_serial_host<unsigned int, double> driver 5.410
read_coo_matrix<unsigned int, double> driver 0.350
test_coo_matrix_kernels<unsigned int, double> driver 0.060
coo_to_csr<unsigned int, double> driver 0.020
csr_to_coo<unsigned int, double> driver 0.020
Executing actions 100 % done
amplxe-cl -report summary -result-dir r001hs
Using result path `/home/zxx/work_autumn_2011/all_format/r001hs'
Executing actions 75 % Generating a report
Summary
-------
Elapsed Time: 11.312
CPU Time: 11.280
Executing actions 100 % done
同collect 后面的。
This example runs the hardware event-based sampling collector for the sample application and displays the default summary report.
$ amplxe-cl -collect-with runsa -knob event-config=CPU_CLK_UNHALTED.CORE,CPU_CLK_UNHALTED.REF,INST_RETIRED.ANYhome/test/sample
比较常用的命令
collect
event-config
knob
$ amplxe-cl -collect-with runsa -knob event-config=CPU_CLK_UNHALTED.CORE,CPU_CLK_UNHALTED.REF,INST_RETIRED.ANY home/test/sample查看报告时比较特殊
$amplxe-cl -report sfdump -result-dir r000rs
Currently, the only way to view the sample-after values is to display the results of a run with the default values using the 'sfdump' report type, e.g.,
sudo amplxe-cl -collect-with runsa -knob event-config=UOPS_EXECUTED.PORT2_CORE:sa=1000,UOPS_EXECUTED.PORT3_CORE:sa=1000,UOPS_EXECUTED.PORT4_CORE:sa=1000 -- ./driver
以我的经验,sa>=1000,否则机器容易跑死。
我设了100,1,死了2次。
$ amplxe-cl -report hw-events -r r010runsa/
这个report 类型对于原生事件查看结果比较好
This option enables multiple runs to achieve more precise results for hardware event-based collections.
When disabled, the collector uses event multiplexing.
sudo amplxe-cl -collect-with runsa -knob event-config=UOPS_EXECUTED.PORT2_CORE,UOPS_EXECUTED.PORT3_CORE,UOPS_EXECUTED.PORT4_CORE -- ./dr iver
用了 之后,不能跑第二次。
测的结果不太准啊, 郁闷。。。
不知道为什么,一定要学好architecture system and os system.
找出原因来。