除了前面提到的内存分配函数cudaMalloc之外，这里再简单的介绍几个常用的内存分配函数：cudaMallocPitch、cudaMalloc3D等。

1、cudaMallocPitch

cudaError_t  cudaMallocPitch(void **devPtr, size_t *pitch, size_t width, size_t height);

该函数用来分配指定大小的线性内存，宽度至少为width，高度为height，在分配2D数组的时候建议使用该函数，而不用前面提到的cudaMallocPitch函数，因为该函数在分配内存时会适当的填充一些字节来保证对其要求，从而在按行访问时，或者在二维数组和设备存储器的其他区域间复制是，保证了最佳的性能！（通过调用cudaMemcpy2D等类似函数）。那么实际的分配的内存大小为：sizeof(T)*pitch * height，则访问2D数组中任意一个元素[Row,Column]的计算公式如下： T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column; 第一个参数，void**类型，devPtr：用来接受被分配内存的其实地址第二个参数，size_t*类型，pitch：用来接受实际行间距，即被填充后的实际宽度（单位字节），大于等于第三个参数width 第三个参数，size_t类型，width：请求分配内存的宽度（单位字节），如2D数组的列数第四个参数，size_t类型，height：请求分配内存的高度（单位字节），如2D数组的行数

2、cudaMalloc3D

cudaError_t cudaMalloc3D(struct cudaPitchedPtr* pitchedDevPtr, struct cudaExtent extent);

该函数用来申请设备上的1D、2D、3D内存对象，同cudaMallocPitch函数一样，为了最佳的性能，会填充一些字节。第一个参数，cudaPitchPtr*类型，pitchedDevPtr：作为传出参数，用于记录分配得到的设备内存信息，具体结构如下：

struct  cudaPitchedPtr
{
void   *ptr;      //指向分配得到的设备内存地址
size_t  pitch;    //实际被分配的宽度，单位字节
size_t  xsize;    //逻辑宽度，记录有效的宽度，单位字节
size_t  ysize;    //逻辑高度，记录有效的高度，单位高度
};

第二个参数，cudaExtent类型，extent：作为传入参数，传入所请求申请的内存信息，包括width、height、depth；具体结构如下：

struct cudaExtent
{
size_t width;     //请求的宽度，单位字节
size_t height;    //请求的高度，单位字节
size_t depth;     //请求的深度，单位字节
};

3、相关代码

<pre name="code" class="cpp">#include <iostream>
#include <cuda_runtime.h>

using namespace std;


int main()
{
float * pDeviceData = nullptr;
int width = 10 * sizeof(float);
int height = 10 * sizeof(float);
size_t pitch;

cudaError err = cudaSuccess;

//1 use cudaMallocPitch function
err = cudaMallocPitch(&pDeviceData, &pitch, width, height);//注意这里的width和height的单位为字节数
if (err != cudaSuccess)
{
cout << "call cudaMallocPitch fail!!!" << endl;
exit(1);
}
cout << "width: " << width << endl;
cout << "height: " << height << endl;
cout << "pitch: " << pitch << endl;


//2 use cudaMalloc3D
cudaPitchedPtr pitchPtr;
cudaExtent extent;
extent.width = 10 * sizeof(float);
extent.height = 22 * sizeof(float);
extent.depth = 33 * sizeof(float);

err = cudaMalloc3D(&pitchPtr, extent);
if (err != cudaSuccess)
{
cout << "call cudaMalloc3D fail!!!" << endl;
exit(1);
}
cout << "\n\n";
cout << "width: " << extent.width << endl;//输出申请内存的初始值
cout << "height: " << extent.height << endl;
cout << "depth: " << extent.depth << endl;

cout << endl;
cout << "pitch: " << pitchPtr.pitch << endl;//输出实际的宽度值
cout << "xsize: " << pitchPtr.xsize << endl;//有效宽度--等于extent.width
cout << "ysize: " << pitchPtr.ysize << endl;//有效高度--等于extent.height

cudaFree(pDeviceData);
cudaFree(pitchPtr.ptr);
cin.get();
return 0;
}

4、运行结果

5、其他相关设备内存操作函数

注意，以上两个内存分配函数分配的都是线性内存！

        cudaMemset3D,
        cudaMalloc3DArray,
        cudaMallocArray,
        cudaFreeArray,
        cudaMallocHost(void**, size_t) 
        cudaMallocHost (C API)",
        cudaFreeHost, 
        cudaHostAlloc, 
        make_cudaPitchedPtr,         //用于创建cudaPitchedPtr对象
        make_cudaExtent                 //用于创建cudaExtent对象

秒客网

【CUDA】二、内存分配函数

1、cudaMallocPitch

2、cudaMalloc3D

3、相关代码

4、运行结果

5、其他相关设备内存操作函数

相关文章