为什么在每个atomicAdd之后我没有看到变量值的不同/唯一输出？

I'm new to CUDA. I was trying to implement a trie data structure on GPU but it didn't work. I noticed my atomicAdd isn't working as I expected. So I did some experiment with atomicAdd. I wrote this piece of code :

我是CUDA的新手。我试图在GPU上实现trie数据结构，但它不起作用。我注意到我的atomicAdd没有像我预期的那样工作。所以我用atomicAdd做了一些实验。我写了这段代码：

#include <cstdio>

//__device__ int *a; //I also tried the code with using this __device__
                     //variable and allocating it inside kernel instead
                     //using cudaMalloc. Same Result

__global__ void AtomicTestKernel (int*a)
{
    *a = 0;
    __syncthreads();
    for (int i = 0; i < 2; i++)
    {
        if (threadIdx.x % 2)
        {
            atomicAdd(a, 1);
            printf("threadsIndex = %d\t&\ta : %d\n",threadIdx.x,*a);
        }
        else
        {
            atomicAdd(a, 1);
            printf("threadsIndex = %d\t&\ta : %d\n", threadIdx.x, *a);
        }
    }
}

int main()
{
    int * d_a;
    cudaMalloc((void**)&d_a, sizeof(int));

    AtomicTestKernel << <1, 10 >> > (d_a);

    cudaDeviceSynchronize();

    return 0;
}

correct me where I'm wrong about this code :

纠正我对这段代码的错误：

1 - according to CUDA's Programming guide : (on atomic Functions)

1 - 根据CUDA的编程指南:(关于原子函数）

... In other words, no other thread can access this address until the operation is complete

...换句话说，在操作完成之前，没有其他线程可以访问此地址

2 - the int * d_a resides in global memory and so is the kernel's input : int * a because its allocated using cudaMalloc (according to this 3 minute video : Udacity CUDA - Global Memory) and therefore all of threads are seeing the same int * a and not each of them has it's own

2 - int * d_a驻留在全局内存中，内核的输入也是如此：int * a因为它是使用cudaMalloc分配的（根据这个3分钟视频：Udacity CUDA - 全局内存）因此所有线程都看到相同的int * a而不是每个人都有自己的

3 - In the code as before every printf there is an atomicAdd so I expect each of those printfs has the value of *a different from previous and therefore unique.

3 - 在每个printf之前的代码中有一个atomicAdd，所以我希望每个printfs的值都与之前不同，因此是唯一的。

BUT in the result I get I see so many same variable of *a this is the result I get :

但是在结果中，我得到了很多相同的变量* a这是我得到的结果：

threadsIndex = 0        &       a : 5
threadsIndex = 2        &       a : 5
threadsIndex = 4        &       a : 5
threadsIndex = 6        &       a : 5
threadsIndex = 8        &       a : 5
threadsIndex = 1        &       a : 10
threadsIndex = 3        &       a : 10
threadsIndex = 5        &       a : 10
threadsIndex = 7        &       a : 10
threadsIndex = 9        &       a : 10
threadsIndex = 0        &       a : 15
threadsIndex = 2        &       a : 15
threadsIndex = 4        &       a : 15
threadsIndex = 6        &       a : 15
threadsIndex = 8        &       a : 15
threadsIndex = 1        &       a : 20
threadsIndex = 3        &       a : 20
threadsIndex = 5        &       a : 20
threadsIndex = 7        &       a : 20
threadsIndex = 9        &       a : 20
Press any key to continue . . .

1 个解决方案

#1

Since all instruction are executed in the same time within a warp your code is executing all atomic instruction THEN performing the printf, as a consequence, you are reading the result of all your atomic operations.

由于所有指令都在warp中同时执行，因此代码执行所有原子指令然后执行printf，因此，您正在读取所有原子操作的结果。

Here is the exexution of instruction within a warp :

以下是warp中指令的执行：

Instruction | threadId 1       | threadId 2       | *a        
____________________________________________________________
AtomicAdd   | increasing value | waiting          | 1  
              waiting          | increasing value | 2
---------------------------------------------- Warp finished instruction of all AtomicAdd
reading *a  | read value       | read value       | 2

to read the previous value of an atomic operation check the result of the method atomicAdd

读取原子操作的先前值检查方法atomicAdd的结果

int previousValue = atomicAdd(a, 1);

you can have some informations here : https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd

你可以在这里找到一些信息：https：//docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd

#1