从uint32_t [16]数组到uint32_t变量序列的64位副本

时间:2021-07-12 19:58:23

I have been able to use a 64-bit copy on equal sized uint32_t arrays for performance gain and wanted to do the same to a sequence of 16 uint32_t variables, from a uint32_t[16] array. I am unable to substitute to the variables with an array as it causes performance regression.

我已经能够在相同大小的uint32_t数组上使用64位副本来获得性能增益,并且希望对来自uint32_t [16]数组的16个uint32_t变量序列执行相同的操作。我无法用数组替换变量,因为它会导致性能回归。

I noticed the compiler gives pointer addresses in sequence to a series of declared uint32_t variables, in reverse that is the last variable gets the lowest address and increments up by 4 bytes to the first declared variable. I tried to use the start destination address of the that final variable and cast it into a uint64_t * pointer but this did not work. Pointers for the uint32_t[16] array however are in sequence.

我注意到编译器将指针地址按顺序提供给一系列声明的uint32_t变量,反之是最后一个变量获取最低地址,并向第一个声明变量递增4个字节。我试图使用该最终变量的起始目标地址并将其转换为uint64_t *指针,但这不起作用。然而,uint32_t [16]数组的指针是顺序的。

Here is an example of my most recent attempt.

这是我最近尝试的一个例子。

uint32_t x00,x01,x02,x03,x04,x05,x06,x07,x08,x09,x10,x11,x12,x13,x14,x15;
uint64_t *Bu64ptr = (uint64_t *) B;
uint64_t *x15u64ptr = (uint64_t *) &x15;

/* This is an inline function that does 64-bit eqxor on two uint32_t[16] 
& stores the results in uint32_t B[16]*/
salsa8eqxorload64(B,Bx);

/* Trying to 64-bit copy here */
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;
*x15u64ptr++ = *Bu64ptr++;

Am I pursuing the impossible or is my lack of skill getting in the way again? I checked the pointer address value of x15 and x15u64ptr and they are completely different, using the method below.

我是在追求不可能的事情还是我缺乏技能再次妨碍我?我检查了x15和x15u64ptr的指针地址值,它们完全不同,使用下面的方法。

printf("x15u64ptr %p\n", (void *) x15u64ptr);
printf("x15 %p\n", (void *) &x15);

I had one idea to create an array, and use the x?? variables as pointers to the individual elements in the array and then perform the 64-bit copy on both arrays which I hoped would assign the values to the uint32_t variables in that way but got compiler failure warning about invalid ivalue for the = assignment. Maybe I am doing something wrong in the syntax. Using 64-bit memcpy alternatives and custom 64-bit eqxor I have increased the performance of the hashing function by over 10% and expect this to give another 5-10% improvement, if I can only get it to work.

我有一个想法来创建一个数组,并使用x ??变量作为指向数组中各个元素的指针,然后在两个数组上执行64位副本,我希望以这种方式将值分配给uint32_t变量,但是编译器失败警告有关=赋值的无效ivalue。也许我在语法上做错了。使用64位memcpy替代方案和自定义64位eqxor,我已经将散列函数的性能提高了10%以上,如果我只能让它工作,我希望这会再提高5-10%。

1 个解决方案

#1


2  

There is no guarantee that the variables will be placed in the memory at the order in declaration.

无法保证变量将按声明中的顺序放入内存中。

I would use union punning myself.

我会用自己的工会惩罚。

#include <stdio.h>
#include <stdint.h>
#include <string.h>

#define SOMETHING   (uint64_t *)0x12345676   // only
#define LITTLEENDIAN 1

typedef union
{
    uint32_t u32[2];
    uint64_t u64;
}data_64;

int main()
{
    uint64_t *Bu64ptr = SOMETHING;

    data_64 mydata[10];

    //you can copy memory
    memcpy(mydata, Bu64ptr, sizeof(mydata));

    //or just loop
    for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
    {
        mydata[index].u64 = *Bu64ptr++;
    }

    for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
    {   
        printf("Lower word = %x, Upper word = %x\n", mydata[!LITTLEENDIAN], mydata[LITTLEENDIAN]);
    }    

    return 0;
}

It will work exactly the same way in the opposite direction

它将在相反的方向上以完全相同的方式工作

#1


2  

There is no guarantee that the variables will be placed in the memory at the order in declaration.

无法保证变量将按声明中的顺序放入内存中。

I would use union punning myself.

我会用自己的工会惩罚。

#include <stdio.h>
#include <stdint.h>
#include <string.h>

#define SOMETHING   (uint64_t *)0x12345676   // only
#define LITTLEENDIAN 1

typedef union
{
    uint32_t u32[2];
    uint64_t u64;
}data_64;

int main()
{
    uint64_t *Bu64ptr = SOMETHING;

    data_64 mydata[10];

    //you can copy memory
    memcpy(mydata, Bu64ptr, sizeof(mydata));

    //or just loop
    for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
    {
        mydata[index].u64 = *Bu64ptr++;
    }

    for(size_t index = 0; index < sizeof(mydata) / sizeof(mydata[0]); index++)
    {   
        printf("Lower word = %x, Upper word = %x\n", mydata[!LITTLEENDIAN], mydata[LITTLEENDIAN]);
    }    

    return 0;
}

It will work exactly the same way in the opposite direction

它将在相反的方向上以完全相同的方式工作