I have been able to use a 64-bit copy on equal sized uint32_t arrays for performance gain and wanted to do the same to a sequence of 16 uint32_t variables, from a uint32_t[16] array. I am unable to substitute to the variables with an array as it causes performance regression.
我已经能够在相同大小的uint32_t数组上使用64位副本来获得性能增益,并且希望对来自uint32_t [16]数组的16个uint32_t变量序列执行相同的操作。我无法用数组替换变量,因为它会导致性能回归。
I noticed the compiler gives pointer addresses in sequence to a series of declared uint32_t variables, in reverse that is the last variable gets the lowest address and increments up by 4 bytes to the first declared variable. I tried to use the start destination address of the that final variable and cast it into a uint64_t * pointer but this did not work. Pointers for the uint32_t[16] array however are in sequence.
我注意到编译器将指针地址按顺序提供给一系列声明的uint32_t变量,反之是最后一个变量获取最低地址,并向第一个声明变量递增4个字节。我试图使用该最终变量的起始目标地址并将其转换为uint64_t *指针,但这不起作用。然而,uint32_t [16]数组的指针是顺序的。
Here is an example of my most recent attempt.
这是我最近尝试的一个例子。
uint32_t x00,x01,x02,x03,x04,x05,x06,x07,x08,x09,x10,x11,x12,x13,x14,x15;
uint64_t *Bu64ptr = (uint64_t *) B;
uint64_t *x15u64ptr = (uint64_t *) &x15;
/* This is an inline function that does 64-bit eqxor on two uint32_t[16]
& stores the results in uint32_t B[16]*/
salsa8eqxorload64(B,Bx);
/* Trying to 64-bit copy here */
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
Am I pursuing the impossible or is my lack of skill getting in the way again? I checked the pointer address value of x15 and x15u64ptr and they are completely different, using the method below.
我是在追求不可能的事情还是我缺乏技能再次妨碍我?我检查了x15和x15u64ptr的指针地址值,它们完全不同,使用下面的方法。
printf("x15u64ptr %p\n", (void *) x15u64ptr);
printf("x15 %p\n", (void *) &x15);
I had one idea to create an array, and use the x?? variables as pointers to the individual elements in the array and then perform the 64-bit copy on both arrays which I hoped would assign the values to the uint32_t variables in that way but got compiler failure warning about invalid ivalue for the = assignment. Maybe I am doing something wrong in the syntax. Using 64-bit memcpy alternatives and custom 64-bit eqxor I have increased the performance of the hashing function by over 10% and expect this to give another 5-10% improvement, if I can only get it to work.
我有一个想法来创建一个数组,并使用x ??变量作为指向数组中各个元素的指针,然后在两个数组上执行64位副本,我希望以这种方式将值分配给uint32_t变量,但是编译器失败警告有关=赋值的无效ivalue。也许我在语法上做错了。使用64位memcpy替代方案和自定义64位eqxor,我已经将散列函数的性能提高了10%以上,如果我只能让它工作,我希望这会再提高5-10%。
1 个解决方案
#1
2
There is no guarantee that the variables will be placed in the memory at the order in declaration.
无法保证变量将按声明中的顺序放入内存中。
I would use union punning myself.
我会用自己的工会惩罚。
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#define SOMETHING (uint64_t *)0x12345676 // only
#define LITTLEENDIAN 1
typedef union
{
uint32_t u32[2];
uint64_t u64;
}data_64;
int main()
{
uint64_t *Bu64ptr = SOMETHING;
data_64 mydata[10];
//you can copy memory
memcpy(mydata, Bu64ptr, sizeof(mydata));
//or just loop
for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
{
mydata[index].u64 = *Bu64ptr++;
}
for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
{
printf("Lower word = %x, Upper word = %x\n", mydata[!LITTLEENDIAN], mydata[LITTLEENDIAN]);
}
return 0;
}
It will work exactly the same way in the opposite direction
它将在相反的方向上以完全相同的方式工作
#1
2
There is no guarantee that the variables will be placed in the memory at the order in declaration.
无法保证变量将按声明中的顺序放入内存中。
I would use union punning myself.
我会用自己的工会惩罚。
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#define SOMETHING (uint64_t *)0x12345676 // only
#define LITTLEENDIAN 1
typedef union
{
uint32_t u32[2];
uint64_t u64;
}data_64;
int main()
{
uint64_t *Bu64ptr = SOMETHING;
data_64 mydata[10];
//you can copy memory
memcpy(mydata, Bu64ptr, sizeof(mydata));
//or just loop
for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
{
mydata[index].u64 = *Bu64ptr++;
}
for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
{
printf("Lower word = %x, Upper word = %x\n", mydata[!LITTLEENDIAN], mydata[LITTLEENDIAN]);
}
return 0;
}
It will work exactly the same way in the opposite direction
它将在相反的方向上以完全相同的方式工作