GPU上可用的共享内存量

时间:2021-02-25 16:59:02

How can I know the amount of shared memory available on my GPU? I'm interested in how big arrays I can store in my shared memory. My GPU is Nvidia GeForce 650 Ti. I am using VS2013 with CUDA toolkit for coding.

我怎么知道GPU上可用的共享内存量?我对我可以在共享内存中存储的大数组感兴趣。我的GPU是Nvidia GeForce 650 Ti。我正在使用带有CUDA工具包的VS2013进行编码。

I would really appreciate if one will explain, how can I figure it out by myself, not only give a raw number.

我真的很感激,如果有人会解释,我怎么能自己解决,不仅仅给出一个原始数字。

1 个解决方案

#1


7  

Two ways:

两种方式:

  1. read the documentation (programming guide). Your GeForce 650 Ti is cc3.0 GPU. (If you want to learn how to discover that, there is documentation or read item 2).

    阅读文档(编程指南)。你的GeForce 650 Ti是cc3.0 GPU。 (如果你想学习如何发现它,有文档或阅读项目2)。

    For a cc3.0 GPU, it is a maximum of 48KB per threadblock.

    对于cc3.0 GPU,每个线程块最多48KB。

  2. Programmatically, by running cudaGetDeviceProperties (documentation). The cuda sample app deviceQuery demonstrates this.

    以编程方式,通过运行cudaGetDeviceProperties(文档)。 cuda示例app deviceQuery演示了这一点。

EDIT: responding to the question below.

编辑:回答以下问题。

The 48KB limit per threadblock is a logical limit as seen from the perspective of kernel code. There are at least two other numbers:

从内核代码的角度来看,每个线程块的48KB限制是一个逻辑限制。至少有两个其他数字:

  1. Total amount of shared memory per SM (this is also listed in the documentation (same as above) and available via cudaGetDeviceProperties (same as above).) For a cc3.0 GPU this is again 48KB. This will be one limit to occupancy; this particular limit being the total available per SM divided by the amount used by a threadblock. If your threadblock uses 40KB of shared memory, you can have at most 1 threadblock resident per SM, at a time, on a cc3.0 GPU. If your threadblock uses 20KB of shared memory, you could possibly have 2 threadblocks resident per SM, ignoring other limits to occupancy.

    每个SM的共享内存总量(这也在文档中列出(与上面相同),可通过cudaGetDeviceProperties获得(与上面相同)。)对于cc3.0 GPU,这又是48KB。这将是入住率的一个限制;此特定限制是每个SM的可用总数除以线程块使用的数量。如果您的threadblock使用40KB的共享内存,则每个SM最多可以在cc3.0 GPU上驻留1个threadblock。如果您的threadblock使用20KB的共享内存,则每个SM可能有2个线程块,忽略其他占用限制。

  2. Total amount per device/GPU. I consider this to be a less relevant/useful number. It is equal to the total number of SMs on your GPU multiplied by the total amount per SM. This number is not particularly meaningful, i.e. it does not communicate new information beyond the knowledge of the number of SMs on your GPU. I can't really think of a use for this number, at the moment.

    每台设备/ GPU的总金额。我认为这是一个不太相关/有用的数字。它等于GPU上SM的总数乘以每个SM的总量。这个数字并不是特别有意义,即它不会传达新信息,超出了GPU上SM数量的知识。我现在无法想到这个号码的用途。

SM as used above means "streaming multiprocessor" which is identified here. It is also just referred to as "multiprocessor", for example in the table 12 I linked above.

上面使用的SM表示这里标识的“流多处理器”。它也被称为“多处理器”,例如在上面链接的表12中。

#1


7  

Two ways:

两种方式:

  1. read the documentation (programming guide). Your GeForce 650 Ti is cc3.0 GPU. (If you want to learn how to discover that, there is documentation or read item 2).

    阅读文档(编程指南)。你的GeForce 650 Ti是cc3.0 GPU。 (如果你想学习如何发现它,有文档或阅读项目2)。

    For a cc3.0 GPU, it is a maximum of 48KB per threadblock.

    对于cc3.0 GPU,每个线程块最多48KB。

  2. Programmatically, by running cudaGetDeviceProperties (documentation). The cuda sample app deviceQuery demonstrates this.

    以编程方式,通过运行cudaGetDeviceProperties(文档)。 cuda示例app deviceQuery演示了这一点。

EDIT: responding to the question below.

编辑:回答以下问题。

The 48KB limit per threadblock is a logical limit as seen from the perspective of kernel code. There are at least two other numbers:

从内核代码的角度来看,每个线程块的48KB限制是一个逻辑限制。至少有两个其他数字:

  1. Total amount of shared memory per SM (this is also listed in the documentation (same as above) and available via cudaGetDeviceProperties (same as above).) For a cc3.0 GPU this is again 48KB. This will be one limit to occupancy; this particular limit being the total available per SM divided by the amount used by a threadblock. If your threadblock uses 40KB of shared memory, you can have at most 1 threadblock resident per SM, at a time, on a cc3.0 GPU. If your threadblock uses 20KB of shared memory, you could possibly have 2 threadblocks resident per SM, ignoring other limits to occupancy.

    每个SM的共享内存总量(这也在文档中列出(与上面相同),可通过cudaGetDeviceProperties获得(与上面相同)。)对于cc3.0 GPU,这又是48KB。这将是入住率的一个限制;此特定限制是每个SM的可用总数除以线程块使用的数量。如果您的threadblock使用40KB的共享内存,则每个SM最多可以在cc3.0 GPU上驻留1个threadblock。如果您的threadblock使用20KB的共享内存,则每个SM可能有2个线程块,忽略其他占用限制。

  2. Total amount per device/GPU. I consider this to be a less relevant/useful number. It is equal to the total number of SMs on your GPU multiplied by the total amount per SM. This number is not particularly meaningful, i.e. it does not communicate new information beyond the knowledge of the number of SMs on your GPU. I can't really think of a use for this number, at the moment.

    每台设备/ GPU的总金额。我认为这是一个不太相关/有用的数字。它等于GPU上SM的总数乘以每个SM的总量。这个数字并不是特别有意义,即它不会传达新信息,超出了GPU上SM数量的知识。我现在无法想到这个号码的用途。

SM as used above means "streaming multiprocessor" which is identified here. It is also just referred to as "multiprocessor", for example in the table 12 I linked above.

上面使用的SM表示这里标识的“流多处理器”。它也被称为“多处理器”,例如在上面链接的表12中。