char和integer数组之间的速度差异?

时间:2021-10-11 12:47:20

currently I'm dealing with a video processing software in which the picture data (8bit signed and unsigned) is stored in arrays of 16-aligned integers allocated as

目前我正在处理一个视频处理软件,其中的图片数据(8位签名和未签名)存储在16个对齐的整数数组中。

__declspec(align(16)) int *pData = (__declspec(align(16)) int *)_mm_malloc(width*height*sizeof(int),16);

Generally, wouldn't it enable faster reading and writing if one used signed/unsigned char arrays like this?:

一般来说,如果像这样使用有符号/无符号字符数组,它不会使读取和写入速度更快吗?

__declspec(align(16)) int *pData = (__declspec(align(16)) unsigned char *)_mm_malloc(width*height*sizeof(unsigned char),16);

I know little about cache line size and data transfer optimization, but at least I know that it is an issue. Beyond that, SSE will be used in future, and in that case char-arrays - unlike int arrays - are already in a packed format. So which version would be faster?

我对缓存线大小和数据传输优化知之甚少,但至少我知道这是一个问题。除此之外,SSE将在未来使用,在这种情况下,字符数组——不像int数组——已经是打包格式了。那么哪个版本会更快呢?

4 个解决方案

#1


4  

If you're planning to use SSE, storing the data in its native size (8-bit) is almost certainly a better choice, since loads of operations can be done without unpacking, and even if you need to unpack for pmaddwd or other similar instructions, its still faster because you have to load less data.

如果你计划使用SSE,将数据存储在本土规模(8位)几乎肯定是一个更好的选择,因为大量的操作可以不拆包,即使你需要解压pmaddwd或其他类似的指令,它仍然更快,因为你必须负载较少的数据。

Even in scalar code, loading 8-bit or 16-bit values is no slower than loading 32-bit, since movzx/movsx is no different in speed from mov. So you just save memory, which surely can't hurt.

即使在标量代码中,加载8位或16位值也不会比加载32位值慢,因为movzx/movsx与mov的速度没有区别。所以你只是保存记忆,这肯定不会伤害你。

#2


0  

It really depends on your target CPU -- you should read up on its specs and run some benchmarks as everyone has already suggested. Many factors could influence performance. The first obvious one that comes to my mind is that your array of ints is 2 to 4 times larger than an array of chars and, hence, if the array is big enough, you'll get fewer data cache hits, which will definitely slow down the performance.

这实际上取决于您的目标CPU——您应该阅读它的规范,并像每个人已经建议的那样运行一些基准。许多因素都会影响性能。我想到的第一个明显的问题是,你的ints数组比chars数组大2到4倍,因此,如果数组足够大,你会得到更少的数据缓存命中,这肯定会降低性能。

#3


-1  

on the contrary, packing and unpacking is CPU commands expensive.

相反,包装和拆箱是昂贵的CPU命令。

if you want to make a lot of a random pixel operations - it is faster to make it an array of int so that each pixel has its own address.

如果你想要做很多随机的像素操作,它会更快地使它成为一个整数数组,以便每个像素都有它自己的地址。

but if you iterate through your image sequencly you want to make a chars array so that it is small in size and reduces the chances to have a page fault (Especially for large images)

但是如果你连续地遍历你的图像你想要创建一个chars数组这样它的大小就会很小并且减少出现页面错误的机会(特别是对于大的图像)

#4


-1  

Char arrays can be slower in some cases. As a very general rule of thumb, the native word size is the best to go for, which will more than likely be 4-byte (32-bit) or 8-byte (64-bit). Even better is to have everything aligned to 16-bytes as you have already done... this will enable faster copies if you use SSE instructions (MOVNTA). If you are only concerned with moving items around this will have a much greater impact than the type used by the array...

在某些情况下,字符数组可以更慢。作为一个非常通用的经验法则,本机单词大小是最好的选择,很可能是4字节(32位)或8字节(64位)。更好的做法是将所有内容都对齐到16字节,就像您已经做过的那样……如果您使用SSE指令(MOVNTA),这将使复制速度更快。如果您只关心移动项,那么这将比数组所使用的类型产生更大的影响……

#1


4  

If you're planning to use SSE, storing the data in its native size (8-bit) is almost certainly a better choice, since loads of operations can be done without unpacking, and even if you need to unpack for pmaddwd or other similar instructions, its still faster because you have to load less data.

如果你计划使用SSE,将数据存储在本土规模(8位)几乎肯定是一个更好的选择,因为大量的操作可以不拆包,即使你需要解压pmaddwd或其他类似的指令,它仍然更快,因为你必须负载较少的数据。

Even in scalar code, loading 8-bit or 16-bit values is no slower than loading 32-bit, since movzx/movsx is no different in speed from mov. So you just save memory, which surely can't hurt.

即使在标量代码中,加载8位或16位值也不会比加载32位值慢,因为movzx/movsx与mov的速度没有区别。所以你只是保存记忆,这肯定不会伤害你。

#2


0  

It really depends on your target CPU -- you should read up on its specs and run some benchmarks as everyone has already suggested. Many factors could influence performance. The first obvious one that comes to my mind is that your array of ints is 2 to 4 times larger than an array of chars and, hence, if the array is big enough, you'll get fewer data cache hits, which will definitely slow down the performance.

这实际上取决于您的目标CPU——您应该阅读它的规范,并像每个人已经建议的那样运行一些基准。许多因素都会影响性能。我想到的第一个明显的问题是,你的ints数组比chars数组大2到4倍,因此,如果数组足够大,你会得到更少的数据缓存命中,这肯定会降低性能。

#3


-1  

on the contrary, packing and unpacking is CPU commands expensive.

相反,包装和拆箱是昂贵的CPU命令。

if you want to make a lot of a random pixel operations - it is faster to make it an array of int so that each pixel has its own address.

如果你想要做很多随机的像素操作,它会更快地使它成为一个整数数组,以便每个像素都有它自己的地址。

but if you iterate through your image sequencly you want to make a chars array so that it is small in size and reduces the chances to have a page fault (Especially for large images)

但是如果你连续地遍历你的图像你想要创建一个chars数组这样它的大小就会很小并且减少出现页面错误的机会(特别是对于大的图像)

#4


-1  

Char arrays can be slower in some cases. As a very general rule of thumb, the native word size is the best to go for, which will more than likely be 4-byte (32-bit) or 8-byte (64-bit). Even better is to have everything aligned to 16-bytes as you have already done... this will enable faster copies if you use SSE instructions (MOVNTA). If you are only concerned with moving items around this will have a much greater impact than the type used by the array...

在某些情况下,字符数组可以更慢。作为一个非常通用的经验法则,本机单词大小是最好的选择,很可能是4字节(32位)或8字节(64位)。更好的做法是将所有内容都对齐到16字节,就像您已经做过的那样……如果您使用SSE指令(MOVNTA),这将使复制速度更快。如果您只关心移动项,那么这将比数组所使用的类型产生更大的影响……