NVIDIA-Turing-Architecture-WhitepaperNVIDIA-图灵架构的白皮书

时间:2021-10-12 13:05:22
【文件属性】:
文件名称:NVIDIA-Turing-Architecture-WhitepaperNVIDIA-图灵架构的白皮书
文件大小:4.92MB
文件格式:PDF
更新时间:2021-10-12 13:05:22
NVIDIA Graphics TABLE OF CONTENTS Introduction to the NVIDIA Turing Architecture ....................................................................1 NVIDIA Turing Key Features.......................................................................................................... 3 New Streaming Multiprocessor (SM) ....................................................................................... 3 Turing Tensor Cores................................................................................................................. 4 Real-Time Ray Tracing Acceleration ......................................................................................... 4 New Shading Advancements.................................................................................................... 4 Mesh Shading...................................................................................................................... 4 Variable Rate Shading (VRS)................................................................................................ 5 Texture-Space Shading........................................................................................................ 5 Multi-View Rendering (MVR)............................................................................................... 5 Deep Learning Features for Graphics....................................................................................... 5 Deep Learning Features for Inference...................................................................................... 6 GDDR6 High-Performance Memory Subsystem....................................................................... 6 Second-Generation NVIDIA NVLink .......................................................................................... 6 USB-C and VirtualLink............................................................................................................... 6 Turing GPU Architecture In-Depth ........................................................................................7 Turing TU102 GPU........................................................................................................................ 7 Turing Streaming Multiprocessor (SM) Architecture.................................................................. 11 Turing Tensor Cores............................................................................................................... 15 Turing Optimized for Datacenter Applications........................................................................... 16 Turing Memory Architecture and Display Features.................................................................... 20 GDDR6 Memory Subsystem................................................................................................... 20 L2 Cache and ROPs................................................................................................................. 21 Turing Memory Compression................................................................................................. 22 Video and Display Engine ....................................................................................................... 22 USB-C and VirtualLink................................................................................................................. 24 NVLink Improves SLI ................................................................................................................... 24 Turing Ray Tracing Technology............................................................................................26 Turing RT Cores .......................................................................................................................... 31 NVIDIA NGX Technology .....................................................................................................34 NGX Software Architecture ........................................................................................................ 34 Deep Learning Super-Sampling (DLSS) ....................................................................................... 35 InPainting ................................................................................................................................... 38 AI Slow-Mo............................................................................................................................. 39 AI Super Rez........................................................................................................................... 39 NVIDIA Turing GPU Architecture WP-09183-001_v01 | iii Turing Advanced Shading Technologies ..............................................................................40 Mesh Shading............................................................................................................................. 40 Variable Rate Shading................................................................................................................. 43 Content Adaptive Shading...................................................................................................... 45 Motion Adaptive Shading....................................................................................................... 46 Foveated Rendering ............................................................................................................... 47 Texture Space Shading ............................................................................................................... 48 The Mechanics of TSS............................................................................................................. 49 Multi-View Rendering................................................................................................................. 51 Multi-View Rendering Use Cases............................................................................................ 52 Resource Management and Binding Model ............................................................................... 54 Turing Features Enhance Virtual Reality ..............................................................................55 Conclusion ..........................................................................................................................57 Appendix A Turing TU104 GPU ............................................................................................58 Appendix B Turing TU106 GPU ...........................................................................................63 Appendix C RTX-OPS Description ........................................................................................66 The Hybrid Rendering Model ..................................................................................................... 66 RTX-OPS Workload-based Metric Explained............................................................................... 67 Appendix D Ray Tracing Overview .......................................................................................69 Basic Ray Tracing Mechanics...................................................................................................... 70 Bounding Volume Hierarchy .................................................................................................. 71 Denoising Filtering...................................................................................................................... 73 NVIDIA Turing GPU Architecture WP-09183-001_v01 | iv LIST OF FIGURES Figure 1. Turing Reinvents Graphics............................................................................................ 2 Figure 2. Turing TU102 Full GPU with 72 SM Units ..................................................................... 8 Figure 3. NVIDIA Turing TU102 GPU.......................................................................................... 10 Figure 4. Turing TU102/TU104/TU106 Streaming Multiprocessor (SM).................................... 12 Figure 5. Concurrent Execution of Floating Point and Integer Instructions in the Turing SM.... 13 Figure 6. New Shared Memory Architecture............................................................................. 14 Figure 7. Turing Shading Performance Speedup versus Pascal on Many Different Workloads. 14 Figure 8. New Turing Tensor Cores Provide Multi-Precision for AI Inference............................ 16 Figure 9. Tesla T4 delivers up to 40X Higher Inference Performance........................................ 17 Figure 10. Tesla T4 Delivers More than 50X the Energy Efficiency of CPU-based Inferencing .... 18 Figure 11. Turing GDDR6 ............................................................................................................. 21 Figure 12. 50% Higher Effective Bandwidth ................................................................................ 22 Figure 13. Video Feature Enhancements..................................................................................... 23 Figure 14. NVLink Enables New SLI Display Topologies............................................................... 25 Figure 15. SOL MAN from NVIDIA SOL Ray Tracing Demo (See Demo) ....................................... 27 Figure 16. Hybrid Rendering Pipeline .......................................................................................... 28 Figure 17. Details of Ray Tracing and Rasterization Pipeline Stages............................................ 29 Figure 18. From Reflections Demo .............................................................................................. 30 Figure 19. Ray Tracing Pre Turing ................................................................................................ 32 Figure 20. Turing Ray Tracing with RT Cores................................................................................ 32 Figure 21. Turing Ray Tracing Performance................................................................................. 33 Figure 22. Turing with 4K DLSS is Twice the Performance of Pascal with 4K TAA....................... 35 Figure 23. DLSS 2X versus 64xSS image almost Indistinguishable................................................ 36 Figure 24. DLSS 2X Provides Significantly Better Temporal Stability and Image Clarity Than TAA ......................................................................................................... 37 Figure 25. NGX InPainting Examples, Missing Image Data Is Intelligently Replaced with Meaningful Image Information................................................................................... 38 Figure 26. AI Super Rez Provides Improved Image Clarity Over Other Filtering Methods.......... 39 Figure 27. Mesh Shading, Visually Rich Images ........................................................................... 40 Figure 28. Current Graphics Pipeline versus a Graphics Pipeline with Task and Mesh Shaders.. 41 Figure 29. Screenshot from the Asteroid Field Demo.................................................................. 42 Figure 30. An Asteroid at Low and High Levels of Detail (LOD) ................................................... 42 Figure 31. Dynamically Computed, Spherical Cutaway of a Koenigsegg Model, Viewed in NVIDIA Holodeck™..................................................................................... 43 Figure 32. Turing VRS Supported Shading Rates and Example Application to a Game Frame..... 44 Figure 33. Example of Content Adaptive Shading........................................................................ 46 NVIDIA Turing GPU Architecture WP-09183-001_v01 | v Figure 34. Perceived Blur Due to Object Motion Combined with Retinal and Display Persistence ..................................................................................................... 47 Figure 35. Traditional Rasterization and Shading Process........................................................... 49 Figure 36. Texture Space Shading Process................................................................................... 50 Figure 37. Texture Space Shading for Stereo............................................................................... 51 Figure 38. 200° FOV HMD Where Two Canted Panels are Used and Benefit from MVR............. 53 Figure 39Figure 37 MVR Single Pass Cascaded Shadow Map Rendering .................................... 54 Figure 40. Turing Features for VR................................................................................................ 56 Figure 41. Turing TU104 Full Chip Diagram ................................................................................. 59 Figure 42. Turing TU106 Full Chip Diagram ................................................................................. 64 Figure 43. Workload Distribution Over One Turing Frame Time ................................................. 66 Figure 44. Peak Operations of Each Type Base for GTX 2080 Ti .................................................. 68 Figure 45. Basic Ray Tracing Process ........................................................................................... 70 Figure 46. Abstraction of Tree Traversal and a Ray Intersecting Different Levels of Bounding Boxes.......................................................................................................... 72 Figure 47. Shadow Map Percentage Closer Filtering (PCF) versus Ray Tracing with Denoising... 74 Figure 48. Shadow Mapping Compared to Ray Traced Shadows that use 1 Sample Per Pixel and Denoising............................................................................................... 74 Figure 49. Screen-Space Ambient Occlusion Compared to Ray-Traced Ambient Occlusion........ 75 Figure 50. RTX Ray Tracing........................................................................................................... 76 Figure 51. Scene from Battlefield V with RTX On and Off............................................................ 77 Figure 52. Scene #2 from Battlefield V with RTX On and Off....................................................... 78 Figure 53. Shadow of the Tomb Raider with RTX ON .................................................................. 79 NVIDIA Turing GPU Architecture WP-09183-001_v01 | vi LIST OF TABLES Table 1. Comparison of NVIDIA Pascal GP102 and Turing TU102 .................................... 8 Table 2. Enhanced Video Engine, Tesla P4 versus Tesla T4............................................ 19 Table 3. DisplayPort Support in Turing GPUs .................................................................. 23 Table 4. Comparison of NVIDIA Pascal GP104 and Turing TU104 GPUs........................ 60 Table 5. Comparison of the Pascal Tesla P4 and the Turing Tesla T4 ........................... 61 Table 6. Comparison of NVIDIA Pascal GP104 to Turing TU106 GPUs........................... 64

网友评论