新一代高性能计算要求_______?

时间:2022-12-14 07:40:30

In the PC/single core times, we assumed flop is expensive. As such, the “best” practice having been followed up to date is to trade frequent memory access for reduced and saved flops; the philosophy of code/algorithm optimization is to reduce the number of flops. However, following the current trend of multi/many-core hardware change, flop is becoming the next round of “free lunch”, whereas data movement appears to the new bottleneck because the energy consumed on data moment is 100X that on flop. Hence, from the viewpoint of hardware efficiency and energy effectiveness, the more flops per unit data movement, the merrier. For the future’s extreme-scale simulation, the S&E application would be prone to adopt numerical algorithms that can maximize the number of flops per unit data movement to take full advantage of the “free flop”. This is in perfect contrast with the programming habit and the way of thinking in the past. The algorithm that was deemed to be expensive in the PC/single core times might be reversely a good candidate on the emerging computer architecture; tomorrow highly possibly would see a paradigm shift in defining what a “good” numerical algorithm is.

    As the power of modern super computing systems continues to advance at an exciting pace forward to extreme scales, it is quite clear that the associated numerical software development challenges are also increasingly formidable. On the emerging architectures, memory and data motion present increasingly serious bottlenecks as the required low-power consumption requirements lead to systems with significant restrictions on available memory and communications bandwidth. In consideration of the current trend of hardware change, if no change is made to the key numerical algorithm, the waste of the floating point capability seems unavoidable. Consequently, computational science experts in multiple application domains will need to re-visit key application algorithms and solvers ? with the likelihood that new capabilities will be demanded in order to keep up with the dramatic architectural changes that accompany the impressive increases in compute power.

 

之前是尽量增加数据复制,减少计算。

即空间换时间

 

现在是尽量减少数据复制,增加单次数据移动的计算。

即时间换空间

 

?对么

 

好吧,仔细看了下英文,发现理解错了。

 

之前是计算资源有限,程序需要大量访存然后去使用计算资源,也就是之前学的,各种轮换调度抢占计算资源,产生各种访存,还没想出其他对的想法。

现在情况是计算资源是很随意,访存却受带宽限制,而且移动数据耗能巨大,评价算法的好的标准是每次数据移动发生多少次计算,比值越小越好。

 

之前想的是存储器速度/空间与计算成本比值达到极值点,优化方向发生了变化。

 

一切程序设计都不是在无限资源的情况下进行的,之前各种老师强调的百万用户并发,简单程序处理流程,程序或者语言效率低(无分段加载一类)都是规模达到当前系统无法容纳,需要进行额外的调度处理,应该是自始至终从程序发明开始就在考虑的。只是在现在看来就明显了,瓶颈太大,另一资源利用率极低。