I'm new to CUDA. I was trying to implement a trie data structure on GPU but it didn't work. I noticed my atomicAdd isn't working as I expected. So I did some experiment with atomicAdd. I wrote this piece of code :
我是CUDA的新手。我试图在GPU上实现trie数据结构,但它不起作用。我注意到我的atomicAdd没有像我预期的那样工作。所以我用atomicAdd做了一些实验。我写了这段代码:
#include <cstdio>
//__device__ int *a; //I also tried the code with using this __device__
//variable and allocating it inside kernel instead
//using cudaMalloc. Same Result
__global__ void AtomicTestKernel (int*a)
{
*a = 0;
__syncthreads();
for (int i = 0; i < 2; i++)
{
if (threadIdx.x % 2)
{
atomicAdd(a, 1);
printf("threadsIndex = %d\t&\ta : %d\n",threadIdx.x,*a);
}
else
{
atomicAdd(a, 1);
printf("threadsIndex = %d\t&\ta : %d\n", threadIdx.x, *a);
}
}
}
int main()
{
int * d_a;
cudaMalloc((void**)&d_a, sizeof(int));
AtomicTestKernel << <1, 10 >> > (d_a);
cudaDeviceSynchronize();
return 0;
}
correct me where I'm wrong about this code :
纠正我对这段代码的错误:
1 - according to CUDA's Programming guide : (on atomic Functions)
1 - 根据CUDA的编程指南:(关于原子函数)
... In other words, no other thread can access this address until the operation is complete
...换句话说,在操作完成之前,没有其他线程可以访问此地址
2 - the int * d_a
resides in global memory and so is the kernel's input : int * a
because its allocated using cudaMalloc
(according to this 3 minute video : Udacity CUDA - Global Memory) and therefore all of threads are seeing the same int * a
and not each of them has it's own
2 - int * d_a驻留在全局内存中,内核的输入也是如此:int * a因为它是使用cudaMalloc分配的(根据这个3分钟视频:Udacity CUDA - 全局内存)因此所有线程都看到相同的int * a而不是每个人都有自己的
3 - In the code as before every printf
there is an atomicAdd
so I expect each of those printf
s has the value of *a
different from previous and therefore unique.
3 - 在每个printf之前的代码中有一个atomicAdd,所以我希望每个printfs的值都与之前不同,因此是唯一的。
BUT in the result I get I see so many same variable of *a
this is the result I get :
但是在结果中,我得到了很多相同的变量* a这是我得到的结果:
threadsIndex = 0 & a : 5
threadsIndex = 2 & a : 5
threadsIndex = 4 & a : 5
threadsIndex = 6 & a : 5
threadsIndex = 8 & a : 5
threadsIndex = 1 & a : 10
threadsIndex = 3 & a : 10
threadsIndex = 5 & a : 10
threadsIndex = 7 & a : 10
threadsIndex = 9 & a : 10
threadsIndex = 0 & a : 15
threadsIndex = 2 & a : 15
threadsIndex = 4 & a : 15
threadsIndex = 6 & a : 15
threadsIndex = 8 & a : 15
threadsIndex = 1 & a : 20
threadsIndex = 3 & a : 20
threadsIndex = 5 & a : 20
threadsIndex = 7 & a : 20
threadsIndex = 9 & a : 20
Press any key to continue . . .
1 个解决方案
#1
3
Since all instruction are executed in the same time within a warp your code is executing all atomic instruction THEN performing the printf, as a consequence, you are reading the result of all your atomic operations.
由于所有指令都在warp中同时执行,因此代码执行所有原子指令然后执行printf,因此,您正在读取所有原子操作的结果。
Here is the exexution of instruction within a warp :
以下是warp中指令的执行:
Instruction | threadId 1 | threadId 2 | *a ____________________________________________________________ AtomicAdd | increasing value | waiting | 1 waiting | increasing value | 2 ---------------------------------------------- Warp finished instruction of all AtomicAdd reading *a | read value | read value | 2
to read the previous value of an atomic operation check the result of the method atomicAdd
读取原子操作的先前值检查方法atomicAdd的结果
int previousValue = atomicAdd(a, 1);
you can have some informations here : https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd
你可以在这里找到一些信息:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd
#1
3
Since all instruction are executed in the same time within a warp your code is executing all atomic instruction THEN performing the printf, as a consequence, you are reading the result of all your atomic operations.
由于所有指令都在warp中同时执行,因此代码执行所有原子指令然后执行printf,因此,您正在读取所有原子操作的结果。
Here is the exexution of instruction within a warp :
以下是warp中指令的执行:
Instruction | threadId 1 | threadId 2 | *a ____________________________________________________________ AtomicAdd | increasing value | waiting | 1 waiting | increasing value | 2 ---------------------------------------------- Warp finished instruction of all AtomicAdd reading *a | read value | read value | 2
to read the previous value of an atomic operation check the result of the method atomicAdd
读取原子操作的先前值检查方法atomicAdd的结果
int previousValue = atomicAdd(a, 1);
you can have some informations here : https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd
你可以在这里找到一些信息:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd