哪个是更快的/首选的:memset或for循环,以零出双精度浮点数数组?

时间:2022-02-27 02:56:09
double d[10];
int length = 10;

memset(d, length * sizeof(double), 0);

//or

for (int i = length; i--;)
  d[i] = 0.0;

15 个解决方案

#1


18  

Note that for memset you have to pass the number of bytes, not the number of elements because this is an old C function:

注意,对于memset,您必须传递字节数,而不是元素数,因为这是一个旧的C函数:

memset(d, 0, sizeof(double)*length);

memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.

memset可以更快,因为它是用汇编程序编写的,而std::fill是一个模板函数,它只是在内部执行一个循环。

But for type safety and more readable code I would recommend std::fill() - it is the c++ way of doing things, and consider memset if a performance optimization is needed at this place in the code.

但是对于类型安全和可读性更强的代码,我推荐std::fill()——它是做事情的c++方式,并且在代码中这个位置需要性能优化时考虑memset。

#2


37  

If you really care you should try and measure. However the most portable way is using std::fill():

如果你真的在乎,你应该试着衡量。然而,最便携的方式是使用std::fill():

std::fill( array, array + numberOfElements, 0.0 );

#3


11  

Try this, if only to be cool xD

试试这个,如果只是为了酷xD

{
    double *to = d;
    int n=(length+7)/8;
    switch(length%8){
        case 0: do{ *to++ = 0.0;
        case 7:     *to++ = 0.0;
        case 6:     *to++ = 0.0;
        case 5:     *to++ = 0.0;
        case 4:     *to++ = 0.0;
        case 3:     *to++ = 0.0;
        case 2:     *to++ = 0.0;
        case 1:     *to++ = 0.0;
        }while(--n>0);
    }
}

#4


6  

memset(d,0,10*sizeof(*d));

is likely to be faster. Like they say you can also

可能会更快。就像他们说的,你也可以

std::fill_n(d,10,0.);

but it is most likely a prettier way to do the loop.

但这很可能是一种更好的循环方式。

#5


4  

In addition to the several bugs and omissions in your code, using memset is not portable. You can't assume that a double with all zero bits is equal to 0.0. First make your code correct, then worry about optimizing.

除了代码中的一些bug和遗漏之外,使用memset是不可移植的。你不能假设有0位的双精度浮点数等于0。0。首先让你的代码正确,然后考虑优化。

#6


4  

Assuming the loop length is an integral constant expression, the most probable outcome it that a good optimizer will recognize both the for-loop and the memset(0). The result would be that the assembly generated is essentially equal. Perhaps the choice of registers could differ, or the setup. But the marginal costs per double should really be the same.

假设循环长度是一个积分常数表达式,那么最好的优化器将同时识别for循环和memset(0)。结果是生成的程序集实质上是相等的。也许寄存器的选择不同,或者设置不同。但是每两倍的边际成本应该是相同的。

#7


3  

calloc(length, sizeof(double))

According to IEEE-754, the bit representation of a positive zero is all zero bits, and there's nothing wrong with requiring IEEE-754 compliance. (If you need to zero out the array to reuse it, then pick one of the above solutions).

根据IEEE-754,正0的位表示都是0位,要求IEEE-754遵从没有错。(如果需要将数组归零以重用它,那么选择上面的解决方案之一)。

#8


3  

According to this Wikipedia article on IEEE 754-1975 64-bit floating point a bit pattern of all 0s will indeed properly initialize a double to 0.0. Unfortunately your memset code doesn't do that.

根据Wikipedia关于IEEE 754-1975 64位浮点数的文章,所有0的位模式将正确地初始化一个double到0.0。不幸的是,您的memset代码没有这么做。

Here is the code you ought to be using:

以下是您应该使用的代码:

memset(d, 0, length * sizeof(double));

As part of a more complete package...

作为一个更完整的包的一部分…

{
    double *d;
    int length = 10;
    d = malloc(sizeof(d[0]) * length);
    memset(d, 0, length * sizeof(d[0]));
}

Of course, that's dropping the error checking you should be doing on the return value of malloc. sizeof(d[0]) is slightly better than sizeof(double) because it's robust against changes in the type of d.

当然,这将删除对malloc的返回值应该执行的错误检查。sizeof(d[0])略优于sizeof(double),因为它对d类型的变化是健壮的。

Also, if you use calloc(length, sizeof(d[0])) it will clear the memory for you and the subsequent memset will no longer be necessary. I didn't use it in the example because then it seems like your question wouldn't be answered.

此外,如果您使用calloc(length, sizeof(d[0])),它将为您清除内存,并且不再需要后续的memset。我没有在这个例子中用到它因为那样看起来你的问题就不会被回答了。

#9


3  

The example will not work because you have to allocate memory for your array. You can do this on the stack or on the heap.

这个示例不会起作用,因为您必须为数组分配内存。您可以在堆栈上或堆上执行此操作。

This is an example to do it on the stack:

这是在堆栈上做的一个例子:

double d[50] = {0.0};

No memset is needed after that.

之后不需要memset。

#10


2  

memset(d, 10, 0) is wrong as it only nulls 10 bytes. prefer std::fill as the intent is clearest.

memset(d, 10, 0)是错误的,因为它只空了10个字节。选择std::填充的意图是最清楚的。

#11


1  

Don't forget to compare a properly optimized for loop if you really care about performance.

如果您真的关心性能,请不要忘记比较一个适当优化的for循环。

Some variant of Duff's device if the array is sufficiently long, and prefix --i not suffix i-- (although most compilers will probably correct that automatically.).

如果数组足够长,那么Duff设备的某些变体,以及前缀(我没有后缀i)(尽管大多数编译器可能会自动更正)。

Although I'd question if this is the most valuable thing to be optimising. Is this genuinely a bottleneck for the system?

尽管我想问,这是否是最值得优化的东西。这真的是系统的瓶颈吗?

#12


1  

In general the memset is going to be much faster, make sure you get your length right, obviously your example has not (m)allocated or defined the array of doubles. Now if it truly is going to end up with only a handful of doubles then the loop may turn out to be faster. But as get to the point where the fill loop shadows the handful of setup instructions memset will typically use larger and sometimes aligned chunks to maximize speed.

一般来说,memset要快得多,确保你的长度是正确的,显然你的示例没有分配或定义double的数组。现在,如果它真的要结束只有少数双精度,那么循环可能会更快。但是当填充循环阴影出现时,memset的设置指令通常会使用更大的、有时对齐的块来最大化速度。

As usual, test and measure. (although in this case you end up in the cache and the measurement may turn out to be bogus).

像往常一样,测试和测量。(不过,在这种情况下,您最终会出现在缓存中,而度量结果可能是虚假的)。

#13


0  

I think you mean

我认为你的意思

memset(d, 0, length * sizeof(d[0]))

and

for (int i = length; --i >= 0; ) d[i] = 0;

Personally, I do either one, but I suppose std::fill() is probably better.

就我个人而言,我两者都做,但我认为std::fill()可能更好。

#14


0  

If you're required to not use STL...

如果你被要求不使用STL…

double aValues [10];
ZeroMemory (aValues, sizeof(aValues));

ZeroMemory at least makes the intent clear.

零记忆至少表明了意图。

#15


0  

As an alternative to all stuff proposed, I can suggest you NOT to set array to all zeros at startup. Instead, set up value to zero only when you first access the value in a particular cell. This will stave your question off and may be faster.

作为对所有提议内容的替代,我建议您不要在启动时将数组设置为所有的0。相反,只在首次访问特定单元格中的值时才将值设置为0。这将避免你的问题,而且可能会更快。

#1


18  

Note that for memset you have to pass the number of bytes, not the number of elements because this is an old C function:

注意,对于memset,您必须传递字节数,而不是元素数,因为这是一个旧的C函数:

memset(d, 0, sizeof(double)*length);

memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.

memset可以更快,因为它是用汇编程序编写的,而std::fill是一个模板函数,它只是在内部执行一个循环。

But for type safety and more readable code I would recommend std::fill() - it is the c++ way of doing things, and consider memset if a performance optimization is needed at this place in the code.

但是对于类型安全和可读性更强的代码,我推荐std::fill()——它是做事情的c++方式,并且在代码中这个位置需要性能优化时考虑memset。

#2


37  

If you really care you should try and measure. However the most portable way is using std::fill():

如果你真的在乎,你应该试着衡量。然而,最便携的方式是使用std::fill():

std::fill( array, array + numberOfElements, 0.0 );

#3


11  

Try this, if only to be cool xD

试试这个,如果只是为了酷xD

{
    double *to = d;
    int n=(length+7)/8;
    switch(length%8){
        case 0: do{ *to++ = 0.0;
        case 7:     *to++ = 0.0;
        case 6:     *to++ = 0.0;
        case 5:     *to++ = 0.0;
        case 4:     *to++ = 0.0;
        case 3:     *to++ = 0.0;
        case 2:     *to++ = 0.0;
        case 1:     *to++ = 0.0;
        }while(--n>0);
    }
}

#4


6  

memset(d,0,10*sizeof(*d));

is likely to be faster. Like they say you can also

可能会更快。就像他们说的,你也可以

std::fill_n(d,10,0.);

but it is most likely a prettier way to do the loop.

但这很可能是一种更好的循环方式。

#5


4  

In addition to the several bugs and omissions in your code, using memset is not portable. You can't assume that a double with all zero bits is equal to 0.0. First make your code correct, then worry about optimizing.

除了代码中的一些bug和遗漏之外,使用memset是不可移植的。你不能假设有0位的双精度浮点数等于0。0。首先让你的代码正确,然后考虑优化。

#6


4  

Assuming the loop length is an integral constant expression, the most probable outcome it that a good optimizer will recognize both the for-loop and the memset(0). The result would be that the assembly generated is essentially equal. Perhaps the choice of registers could differ, or the setup. But the marginal costs per double should really be the same.

假设循环长度是一个积分常数表达式,那么最好的优化器将同时识别for循环和memset(0)。结果是生成的程序集实质上是相等的。也许寄存器的选择不同,或者设置不同。但是每两倍的边际成本应该是相同的。

#7


3  

calloc(length, sizeof(double))

According to IEEE-754, the bit representation of a positive zero is all zero bits, and there's nothing wrong with requiring IEEE-754 compliance. (If you need to zero out the array to reuse it, then pick one of the above solutions).

根据IEEE-754,正0的位表示都是0位,要求IEEE-754遵从没有错。(如果需要将数组归零以重用它,那么选择上面的解决方案之一)。

#8


3  

According to this Wikipedia article on IEEE 754-1975 64-bit floating point a bit pattern of all 0s will indeed properly initialize a double to 0.0. Unfortunately your memset code doesn't do that.

根据Wikipedia关于IEEE 754-1975 64位浮点数的文章,所有0的位模式将正确地初始化一个double到0.0。不幸的是,您的memset代码没有这么做。

Here is the code you ought to be using:

以下是您应该使用的代码:

memset(d, 0, length * sizeof(double));

As part of a more complete package...

作为一个更完整的包的一部分…

{
    double *d;
    int length = 10;
    d = malloc(sizeof(d[0]) * length);
    memset(d, 0, length * sizeof(d[0]));
}

Of course, that's dropping the error checking you should be doing on the return value of malloc. sizeof(d[0]) is slightly better than sizeof(double) because it's robust against changes in the type of d.

当然,这将删除对malloc的返回值应该执行的错误检查。sizeof(d[0])略优于sizeof(double),因为它对d类型的变化是健壮的。

Also, if you use calloc(length, sizeof(d[0])) it will clear the memory for you and the subsequent memset will no longer be necessary. I didn't use it in the example because then it seems like your question wouldn't be answered.

此外,如果您使用calloc(length, sizeof(d[0])),它将为您清除内存,并且不再需要后续的memset。我没有在这个例子中用到它因为那样看起来你的问题就不会被回答了。

#9


3  

The example will not work because you have to allocate memory for your array. You can do this on the stack or on the heap.

这个示例不会起作用,因为您必须为数组分配内存。您可以在堆栈上或堆上执行此操作。

This is an example to do it on the stack:

这是在堆栈上做的一个例子:

double d[50] = {0.0};

No memset is needed after that.

之后不需要memset。

#10


2  

memset(d, 10, 0) is wrong as it only nulls 10 bytes. prefer std::fill as the intent is clearest.

memset(d, 10, 0)是错误的,因为它只空了10个字节。选择std::填充的意图是最清楚的。

#11


1  

Don't forget to compare a properly optimized for loop if you really care about performance.

如果您真的关心性能,请不要忘记比较一个适当优化的for循环。

Some variant of Duff's device if the array is sufficiently long, and prefix --i not suffix i-- (although most compilers will probably correct that automatically.).

如果数组足够长,那么Duff设备的某些变体,以及前缀(我没有后缀i)(尽管大多数编译器可能会自动更正)。

Although I'd question if this is the most valuable thing to be optimising. Is this genuinely a bottleneck for the system?

尽管我想问,这是否是最值得优化的东西。这真的是系统的瓶颈吗?

#12


1  

In general the memset is going to be much faster, make sure you get your length right, obviously your example has not (m)allocated or defined the array of doubles. Now if it truly is going to end up with only a handful of doubles then the loop may turn out to be faster. But as get to the point where the fill loop shadows the handful of setup instructions memset will typically use larger and sometimes aligned chunks to maximize speed.

一般来说,memset要快得多,确保你的长度是正确的,显然你的示例没有分配或定义double的数组。现在,如果它真的要结束只有少数双精度,那么循环可能会更快。但是当填充循环阴影出现时,memset的设置指令通常会使用更大的、有时对齐的块来最大化速度。

As usual, test and measure. (although in this case you end up in the cache and the measurement may turn out to be bogus).

像往常一样,测试和测量。(不过,在这种情况下,您最终会出现在缓存中,而度量结果可能是虚假的)。

#13


0  

I think you mean

我认为你的意思

memset(d, 0, length * sizeof(d[0]))

and

for (int i = length; --i >= 0; ) d[i] = 0;

Personally, I do either one, but I suppose std::fill() is probably better.

就我个人而言,我两者都做,但我认为std::fill()可能更好。

#14


0  

If you're required to not use STL...

如果你被要求不使用STL…

double aValues [10];
ZeroMemory (aValues, sizeof(aValues));

ZeroMemory at least makes the intent clear.

零记忆至少表明了意图。

#15


0  

As an alternative to all stuff proposed, I can suggest you NOT to set array to all zeros at startup. Instead, set up value to zero only when you first access the value in a particular cell. This will stave your question off and may be faster.

作为对所有提议内容的替代,我建议您不要在启动时将数组设置为所有的0。相反,只在首次访问特定单元格中的值时才将值设置为0。这将避免你的问题,而且可能会更快。