文件名称:C++多线程编程
文件大小:68KB
文件格式:DOC
更新时间:2014-11-03 09:06:25
多线程
The first thing we do is choose a device and check to see whether it supports a feature known as device overlap. A GPU supporting device overlap possesses the capacity to simultaneously execute a CUDA C kernel while performing a copy between device and host memory. As we’ve promised before, we’ll use multiple streams to achieve this overlap of computation and data transfer, but first we’ll see how to create and use a single stream. As with all of our examples that aim to measure performance improvements (or regressions), we begin by creating and