一些网摘的hpc材料

source from: https://computing.llnl.gov

Factors determines a large-scale program's performance

4 * Application related factors:

5 * algorithms

6 * dataset size

7 * Memory Usage Pattern

8 * Use of IO

9 * Communication Patterns

10 * Task Granularity

11 * Load Balancing

12 * Amdahl's Law

14 * Hardware factors

15 * Processors Architecture

16 * Memory Hierarchy

17 * I/O configuration

18 * Network

20 * Software factors

21 * OS

22 * Compiler

23 * Preprocessor

24 * Communication protocols

25 * Libraries

Performance analysis:

　　Timers, Profiles, system stat, memory tools

Learn some about hardware archiecture:

Intel Xeon 5500/5600

　　4-core/ 6-core

　　2.4/2.8 GHz

　　Cache

　　　　L1 Data 32Kb, private

　　　　L1 Instruction 32Kb, private

　　 L2 256K, private

　　　　 L3 8Mb/12Mb, shared

Cpu-Memory bandwidth: 32 Gb/s

Intel Xeon E5-2670

　　　　8-core, 2.6GHz

Cache

　　　　　　L1 Data 32K, private

　　　　　　L1 Instruction 32K, private

　　　　　　L2 256K, private

　　　　　　L3 20Mb, shared

CPU-Memory bandwidth 51.2G/s

AMD processors

　　　　2.2 GHz

　　Cache

　　　　　　　L1 Data 64k (2-way)

　　　　　　　L1 Instruction 64k(2-way)

　　　　　　　L2 512K private

　　　　　　　L3 2M shared

　 Direct - connect Architecture

　　　　CPU-memory bandwidth 10.7G/s per socket F

　　　　other connect socket bandwidth 8G/s(2-way)

　 4x Infiniband Interconnect

　　　 * SDR 1.25G/s

　　　 * DDR 2.5G/s

* QDR 5G/s

Learn something about NUMA　　

　　-physical: each node has sevearl(2-4) sockets, each socket has sevearl(4-8) CPU cores. On same socket, cores share L3 cache; socket-socket communcation through CPU-memory bus, almost 2x ~ 5x slower. 　

-design consideration: CPU affinity(numactl --cpunodebind), local memory policy. other compiler/running-time options(mpirun --bind-to-socket -bynode)

Finally and most importantly, a good algorithm. 　　

秒客网

一些网摘的hpc材料

相关文章