CUDA和nvcc:使用预处理器在浮动和双浮动之间进行选择

时间:2022-07-15 16:37:32

The problem:

存在的问题:

Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for cuda with computing capability < 1.3 then define real to be float.

有一个。h,我想定义一个实数,如果编译为c/c++或cuda,计算能力为>= 1.3。如果用计算能力< 1.3的cuda编译,则定义real为float。

After many hours I came to this (which does not work )

过了好几个小时,我来到这里(这行不通)

#   if defined(__CUDACC__)

#       warning * making definitions for cuda

#       if defined(__CUDA_ARCH__)
#           warning __CUDA_ARCH__ is defined
#       else
#           warning __CUDA_ARCH__ is NOT defined
#       endif

#       if (__CUDA_ARCH__ >= 130)
#                       define real double
#                       warning using double in cuda
#       elif (__CUDA_ARCH__ >= 0)
#               define real float
#               warning using float in cuda
#               warning how the hell is this printed when __CUDA_ARCH__ is not defined?
#       else
#               define real 
#               error what the hell is the value of __CUDA_ARCH__ and how can I print it
#       endif

#   else
#       warning * making definitions for c/c++
#       define real double
#       warning using double for c/c++
#   endif

when I compile (note the -arch flag)

当我编译时(注意-arch标志)

nvcc -arch compute_13  -Ilibcutil testFloatDouble.cu 

I get

我得到

* making definitions for cuda
__CUDA_ARCH__ is defined
using double in cuda

* making definitions for cuda
warning __CUDA_ARCH__ is NOT defined
warning using float in cuda
how the hell is this printed if __CUDA_ARCH__ is not defined now?

Undefined symbols for architecture i386:
  "myKernel(float*, int)", referenced from: ....

I know that files get compiled twice by nvcc. The first one is OK (CUDACC defined and CUDA_ARCH >= 130) but what happens the second time? CUDA_DEFINED but CUDA_ARCH undefined or with value < 130? Why ?

我知道nvcc会编译两次文件。第一个是OK (CUDACC定义和CUDA_ARCH >= 130)但是第二次发生了什么?CUDA_DEFINED但CUDA_ARCH未定义或值< 130?为什么?

Thanks for your time.

谢谢你的时间。

2 个解决方案

#1


25  

It seems you might be conflating two things - how to differentiate between the host and device compilation trajectories when nvcc is processing CUDA code, and how to differentiate between CUDA and non-CUDA code. There is a subtle difference between the two. __CUDA_ARCH__ answers the first question, and __CUDACC__ answers the second.

看起来您可能混淆了两件事——如何在nvcc处理CUDA代码时区分主机和设备编译轨迹,以及如何区分CUDA代码和非CUDA代码。两者之间有微妙的区别。__CUDA_ARCH__回答第一个问题,__CUDACC__回答第二个问题。

Consider the following code snippet:

考虑以下代码片段:

#ifdef __CUDACC__
#warning using nvcc

template <typename T>
__global__ void add(T *x, T *y, T *z)
{
    int idx = threadIdx.x + blockDim.x * blockIdx.x;

    z[idx] = x[idx] + y[idx];
}

#ifdef __CUDA_ARCH__
#warning device code trajectory
#if __CUDA_ARCH__ > 120
#warning compiling with double precision
template void add<double>(double *, double *, double *);
#else
#warning compiling with single precision
template void add<float>(float *, float *, float *);
#else
#warning nvcc host code trajectory
#endif
#else
#warning non-nvcc code trajectory
#endif

Here we have a templated CUDA kernel with CUDA architecture dependent instantiation, a separate stanza for host code steeered by nvcc, and a stanza for compilation of host code not steered by nvcc. This behaves as follows:

在这里,我们有一个带有CUDA体系结构相关实例化的模板CUDA内核,一个由nvcc构建的独立的主机代码节,以及一个由nvcc不引导的主机代码编译节。这表现如下:

$ ln -s cudaarch.cu cudaarch.cc
$ gcc -c cudaarch.cc -o cudaarch.o
cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory

$ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:19:2: warning: #warning compiling with single precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11'
ptxas info    : Used 4 registers, 12+16 bytes smem

$ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:16:2: warning: #warning compiling with double precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20'
ptxas info    : Used 8 registers, 44 bytes cmem[0]

The take away points from this are:

从这里取走点是:

  • __CUDACC__ defines whether nvcc is steering compilation or not
  • __CUDACC__定义了nvcc是否指导编译
  • __CUDA_ARCH__is always undefined when compiling host code, steered by nvcc or not
  • __cuda_arch__在编译主机代码时总是未定义,由nvcc引导或不引导
  • __CUDA_ARCH__is only defined for the device code trajectory of compilation steered by nvcc
  • __cuda_arch__仅定义为nvcc引导的设备代码编译轨迹

Those three pieces of information are always enough to have conditional compilation for device code to different CUDA architectures, host side CUDA code, and code not compiled by nvccat all. The nvccdocumentation is a bit terse at times, but all of this is covered in the discussion on compilation trajectories.

这三种信息总是足以对不同的CUDA架构、主机端CUDA代码和nvccat编译的代码进行条件编译。nvccdocument有时有点简洁,但所有这些都在关于编译轨迹的讨论中。

#2


3  

For the moment the only practical solution I see is using a custom define:

目前我看到的唯一实用的解决方案是使用自定义定义:


#   if (!defined(__CUDACC__) ||  defined(USE_DOUBLE_IN_CUDA)) 
#       define real double
#       warning defining double for cuda or c/c++
#   else
#       define real float
#       warning defining float for cuda
#   endif

and then

然后

nvcc -DUSE_DOUBLE_IN_CUDA -arch compute_13  -Ilibcutil testFloatDouble.cu

As it outputs the for the two compilations:

因为它输出了两个汇编:

#warning defining double for cuda or c/c++
#warning defining double for cuda or c/c++

and

nvcc  -Ilibcutil testFloatDouble.cu 

does

#warning defining float for cuda
#warning defining float for cuda

#1


25  

It seems you might be conflating two things - how to differentiate between the host and device compilation trajectories when nvcc is processing CUDA code, and how to differentiate between CUDA and non-CUDA code. There is a subtle difference between the two. __CUDA_ARCH__ answers the first question, and __CUDACC__ answers the second.

看起来您可能混淆了两件事——如何在nvcc处理CUDA代码时区分主机和设备编译轨迹,以及如何区分CUDA代码和非CUDA代码。两者之间有微妙的区别。__CUDA_ARCH__回答第一个问题,__CUDACC__回答第二个问题。

Consider the following code snippet:

考虑以下代码片段:

#ifdef __CUDACC__
#warning using nvcc

template <typename T>
__global__ void add(T *x, T *y, T *z)
{
    int idx = threadIdx.x + blockDim.x * blockIdx.x;

    z[idx] = x[idx] + y[idx];
}

#ifdef __CUDA_ARCH__
#warning device code trajectory
#if __CUDA_ARCH__ > 120
#warning compiling with double precision
template void add<double>(double *, double *, double *);
#else
#warning compiling with single precision
template void add<float>(float *, float *, float *);
#else
#warning nvcc host code trajectory
#endif
#else
#warning non-nvcc code trajectory
#endif

Here we have a templated CUDA kernel with CUDA architecture dependent instantiation, a separate stanza for host code steeered by nvcc, and a stanza for compilation of host code not steered by nvcc. This behaves as follows:

在这里,我们有一个带有CUDA体系结构相关实例化的模板CUDA内核,一个由nvcc构建的独立的主机代码节,以及一个由nvcc不引导的主机代码编译节。这表现如下:

$ ln -s cudaarch.cu cudaarch.cc
$ gcc -c cudaarch.cc -o cudaarch.o
cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory

$ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:19:2: warning: #warning compiling with single precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11'
ptxas info    : Used 4 registers, 12+16 bytes smem

$ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:16:2: warning: #warning compiling with double precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20'
ptxas info    : Used 8 registers, 44 bytes cmem[0]

The take away points from this are:

从这里取走点是:

  • __CUDACC__ defines whether nvcc is steering compilation or not
  • __CUDACC__定义了nvcc是否指导编译
  • __CUDA_ARCH__is always undefined when compiling host code, steered by nvcc or not
  • __cuda_arch__在编译主机代码时总是未定义,由nvcc引导或不引导
  • __CUDA_ARCH__is only defined for the device code trajectory of compilation steered by nvcc
  • __cuda_arch__仅定义为nvcc引导的设备代码编译轨迹

Those three pieces of information are always enough to have conditional compilation for device code to different CUDA architectures, host side CUDA code, and code not compiled by nvccat all. The nvccdocumentation is a bit terse at times, but all of this is covered in the discussion on compilation trajectories.

这三种信息总是足以对不同的CUDA架构、主机端CUDA代码和nvccat编译的代码进行条件编译。nvccdocument有时有点简洁,但所有这些都在关于编译轨迹的讨论中。

#2


3  

For the moment the only practical solution I see is using a custom define:

目前我看到的唯一实用的解决方案是使用自定义定义:


#   if (!defined(__CUDACC__) ||  defined(USE_DOUBLE_IN_CUDA)) 
#       define real double
#       warning defining double for cuda or c/c++
#   else
#       define real float
#       warning defining float for cuda
#   endif

and then

然后

nvcc -DUSE_DOUBLE_IN_CUDA -arch compute_13  -Ilibcutil testFloatDouble.cu

As it outputs the for the two compilations:

因为它输出了两个汇编:

#warning defining double for cuda or c/c++
#warning defining double for cuda or c/c++

and

nvcc  -Ilibcutil testFloatDouble.cu 

does

#warning defining float for cuda
#warning defining float for cuda