两倍还是浮动,哪个更快?(复制)

时间:2021-12-06 02:58:20

This question already has an answer here:

这个问题已经有了答案:

I am reading "accelerated C++". I found one sentence which states "sometimes double is faster in execution than float in C++". After reading sentence I got confused about float and double working. Please explain this point to me.

我读的是“加速c++”。我发现有一句话是这样的:“有时在执行中,double要比float在c++中来得快”。读了这句话后,我对浮法和双关语感到困惑。请给我解释一下这一点。

8 个解决方案

#1


40  

Depends on what the native hardware does.

取决于本机硬件的功能。

  • If the hardware implements double (like the x86 does), then float is emulated by extending it there, and the conversion will cost time. In this case, double will be faster.

    如果硬件实现了double(如x86),则通过在那里扩展float来进行仿真,转换将花费时间。在这种情况下,加倍会更快。

  • If the hardware implements float only, then emulating double with it will cost even more time. In this case, float will be faster.

    如果硬件只实现浮点数,那么用它来模拟双精度浮点数将花费更多的时间。在这种情况下,浮动会更快。

  • And if the hardware implements neither, and both have to be implemented in software. In this case, both will be slow, but double will be slightly slower (more load and store operations at the least).

    如果硬件两者都没有实现,并且都必须在软件中实现。在这种情况下,两者都很慢,但是double会稍微慢一点(至少需要更多的装载和存储操作)。

The quote you mention is probably referring to the x86 platform, where the first case was given. But this doesn't hold true in general.

您提到的报价可能是指第一个案例给出的x86平台。但这在一般情况下并不成立。

#2


23  

You can find a complete answer on this article

你可以在这篇文章上找到一个完整的答案

What Every Computer Scientist Should Know About Floating-Point Arithmetic

每个计算机科学家都应该了解浮点运算

This is a quote from a previous Stack Overflow Thread of float x double regarding Memory Bandwidth

这是之前关于内存带宽的float x的栈溢出线程的引用

If a double requires more storage than a float, then it will take longer to read the data. That's the naive answer. On a modern IA32, it all depends on where the data is coming from. If it's in L1 cache, the load is negligible provided the data comes from a single cache line. If it spans more than one cache line there's a small overhead. If it's from L2, it takes a while longer, if it's in RAM then it's longer still and finally, if it's on disk it's a huge time. So the choice of float or double is less imporant than the way the data is used. If you want to do a small calculation on lots of sequential data, a small data type is preferable. Doing a lot of computation on a small data set would allow you to use bigger data types with any significant effect. If you're accessing the data very randomly, then the choice of data size is unimportant - data is loaded in pages / cache lines. So even if you only want a byte from RAM, you could get 32 bytes transfered (this is very dependant on the architecture of the system). On top of all of this, the CPU/FPU could be super-scalar (aka pipelined). So, even though a load may take several cycles, the CPU/FPU could be busy doing something else (a multiply for instance) that hides the load time to a degree

如果双精度浮点数需要比浮点数更多的存储空间,那么读取数据将需要更长的时间。这是天真的回答。在现代的IA32上,这一切都取决于数据来自哪里。如果它在L1缓存中,只要数据来自单个缓存行,那么负载就可以忽略不计。如果它跨越多个缓存线,就会有一个小开销。如果它来自L2,它需要很长一段时间,如果它在RAM中,那么它会更长,最后,如果它在磁盘上,这是很长的一段时间。因此,浮点数或双精度数的选择比使用数据的方式更重要。如果你想在大量的顺序数据上做一个小的计算,一个小的数据类型是更好的。在一个小的数据集上做大量的计算可以让您使用更大的数据类型,并产生任何重要的影响。如果您非常随机地访问数据,那么数据大小的选择并不重要——数据是在页面/缓存行中加载的。因此,即使您只想从RAM中获得一个字节,也可以获得32字节的传输(这非常依赖于系统的体系结构)。最重要的是,CPU/FPU可以是超级标量(也就是流水线)。因此,即使一个负载可能要花费几个周期,CPU/FPU也可能忙于做一些其他的事情(例如乘),在一定程度上隐藏负载时间

#3


10  

Short answer is: it depends.

简而言之:这要看情况。

CPU with x87 will crunch floats and doubles equally fast. Vectorized code will run faster with floats, because SSE can crunch 4 floats or 2 doubles in one pass.

使用x87的CPU将处理浮点数,并以同样的速度加倍。矢量化代码在浮点数上运行得更快,因为SSE可以一次处理4个浮点数或2个双精度浮点数。

Another thing to consider is memory speed. Depending on your algorithm, your CPU could be idling a lot while waiting for the data. Memory intensive code will benefit from using floats, but ALU limited code won't (unless it is vectorized).

另一个需要考虑的是内存速度。根据算法的不同,在等待数据时,CPU可能会大量闲置。内存密集型的代码将从使用浮点数中获益,但ALU有限的代码不会受益(除非它是向量化的)。

#4


3  

On Intel, the coprocessor (nowadays integrated) will handle both equally fast, but as some others have noted, doubles result in higher memory bandwidth which can cause bottlenecks. If you're using scalar SSE instructions (default for most compilers on 64-bit), the same applies. So generally, unless you're working on a large set of data, it doesn't matter much.

在英特尔上,协处理器(现在的集成处理器)将同时处理这两种速度,但是正如其他人所注意到的,双处理器将导致更高的内存带宽,从而导致瓶颈。如果您正在使用标量SSE指令(对于64位上的大多数编译器来说是默认的),那么同样的情况也会发生。所以,通常情况下,除非你正在处理大量的数据,否则这并不重要。

However, parallel SSE instructions will allow four floats to be handled in one instruction, but only two doubles, so here float can be significantly faster.

然而,并行SSE指令将允许在一条指令中处理4个浮点数,但是只有两个双精度浮点数,因此这里的浮点数可以显著地更快。

#5


3  

I can think of two basic cases when doubles are faster than floats:

我能想到两种基本情况,当双精度比浮点数快时:

  1. Your hardware supports double operations but not float operations, so floats will be emulated by software and therefore be slower.

    您的硬件支持双操作,但不支持浮点操作,因此浮动将被软件模拟,因此速度会更慢。

  2. You really need the precision of doubles. Now, if you use floats anyway you will have to use two floats to reach similar precision to double. The emulation of a true double with floats will be slower than using floats in the first place.

    你真的需要双精度。现在,如果你使用浮点数,你将不得不使用两个浮点数来达到相同的精度。与浮点数相比,使用浮点数来模拟真实的双精度浮点数要慢一些。

    1. You do not necessarily need doubles but your numeric algorithm converges faster due to the enhanced precision of doubles. Also, doubles might offer enough precision to use a faster but numerically less stable algorithm at all.
    2. 您不一定需要双精度,但是由于双精度的提高,您的数字算法收敛得更快。此外,双打可能提供足够的精度来使用速度更快但在数值上更不稳定的算法。

For completeness' sake I also give some reasons for the opposite case of floats being faster. You can see for yourself whichs reasons dominate in your case:

为了完整起见,我也给出了一些相反情况下浮动更快的原因。你可以亲眼看到,在你的案例中,是什么原因主导着你:

  1. Floats are faster than doubles when you don't need double's precision and you are memory-bandwidth bound and your hardware doesn't carry a penalty on floats.

    当不需要双精度时,浮点数比双精度快,并且内存带宽受限,并且硬件不会对浮点数造成影响。

  2. They conserve memory-bandwidth because they occupy half the space per number.

    它们节省内存带宽,因为它们占用每个数字一半的空间。

  3. There are also platforms that can process more floats than doubles in parallel.

    还有一些平台可以同时处理更多的浮点数而不是双倍的浮点数。

#6


2  

There is only one reason 32-bit floats can be slower than 64-bit doubles (or 80-bit 80x87). And that is alignment. Other than that, floats take less memory, generally meaning faster access, better cache performance. It also takes fewer cycles to process 32-bit instructions. And even when (co)-processor has no 32-bit instructions, it can perform them on 64-bit registers with the same speed. It probably possible to create a test case where doubles will be faster than floats, and v.v., but my measurements of real statistics algos didn't show noticeable difference.

32位浮点数比64位双精度浮点数(或80位80x87)慢的原因只有一个。对齐。除此之外,浮动占用的内存更少,通常意味着更快的访问速度和更好的缓存性能。处理32位指令也需要更少的周期。而且,即使(co)处理器没有32位指令,它也可以以相同的速度在64位寄存器上执行这些指令。有可能创建一个测试用例,其中double要比float和v.v快。但是,我对真实统计数据的测量并没有显示出明显的差异。

#7


2  

In experiments of adding 3.3 for 2000000000 times, results are:

在2000000000次添加3.3的实验中,结果如下:

Summation time in s: 2.82 summed value: 6.71089e+07 // float
Summation time in s: 2.78585 summed value: 6.6e+09 // double
Summation time in s: 2.76812 summed value: 6.6e+09 // long double

So double is faster and default in C and C++. It's more portable and the default across all C and C++ library functions. Alos double has significantly higher precision than float.

在C和c++中,double是更快的,默认的。它具有更强的可移植性,在所有C和c++库函数中都是默认的。Alos double比float有更高的精度。

Even Stroustrup recommends double over float:

即使是Stroustrup也建议双倍于浮点数:

"The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don't have that understanding, get advice, take the time to learn, or use double and hope for the best."

单精度、双精度和扩展精度的确切含义是由实现定义的。在选择重要的问题上选择正确的精度需要对浮点计算有深刻的理解。如果你没有那样的理解,那就去寻求建议,花时间去学习,或者用双倍的时间去期待最好的结果。

Perhaps the only case where you should use float instead of double is on 64bit hardware with a modern gcc. Because float is smaller; double is 8 bytes and float is 4 bytes.

也许您应该使用浮点数而不是双精度数的唯一情况是在64位硬件上使用现代gcc。因为浮动较小;double是8字节,float是4字节。

#8


1  

float is usually faster. double offers greater precision. However performance may vary in some cases if special processor extensions such as 3dNow or SSE are used.

浮动通常更快。双提供更高的精度。但是,如果使用特殊的处理器扩展(如3dNow或SSE),性能可能会有所不同。

#1


40  

Depends on what the native hardware does.

取决于本机硬件的功能。

  • If the hardware implements double (like the x86 does), then float is emulated by extending it there, and the conversion will cost time. In this case, double will be faster.

    如果硬件实现了double(如x86),则通过在那里扩展float来进行仿真,转换将花费时间。在这种情况下,加倍会更快。

  • If the hardware implements float only, then emulating double with it will cost even more time. In this case, float will be faster.

    如果硬件只实现浮点数,那么用它来模拟双精度浮点数将花费更多的时间。在这种情况下,浮动会更快。

  • And if the hardware implements neither, and both have to be implemented in software. In this case, both will be slow, but double will be slightly slower (more load and store operations at the least).

    如果硬件两者都没有实现,并且都必须在软件中实现。在这种情况下,两者都很慢,但是double会稍微慢一点(至少需要更多的装载和存储操作)。

The quote you mention is probably referring to the x86 platform, where the first case was given. But this doesn't hold true in general.

您提到的报价可能是指第一个案例给出的x86平台。但这在一般情况下并不成立。

#2


23  

You can find a complete answer on this article

你可以在这篇文章上找到一个完整的答案

What Every Computer Scientist Should Know About Floating-Point Arithmetic

每个计算机科学家都应该了解浮点运算

This is a quote from a previous Stack Overflow Thread of float x double regarding Memory Bandwidth

这是之前关于内存带宽的float x的栈溢出线程的引用

If a double requires more storage than a float, then it will take longer to read the data. That's the naive answer. On a modern IA32, it all depends on where the data is coming from. If it's in L1 cache, the load is negligible provided the data comes from a single cache line. If it spans more than one cache line there's a small overhead. If it's from L2, it takes a while longer, if it's in RAM then it's longer still and finally, if it's on disk it's a huge time. So the choice of float or double is less imporant than the way the data is used. If you want to do a small calculation on lots of sequential data, a small data type is preferable. Doing a lot of computation on a small data set would allow you to use bigger data types with any significant effect. If you're accessing the data very randomly, then the choice of data size is unimportant - data is loaded in pages / cache lines. So even if you only want a byte from RAM, you could get 32 bytes transfered (this is very dependant on the architecture of the system). On top of all of this, the CPU/FPU could be super-scalar (aka pipelined). So, even though a load may take several cycles, the CPU/FPU could be busy doing something else (a multiply for instance) that hides the load time to a degree

如果双精度浮点数需要比浮点数更多的存储空间,那么读取数据将需要更长的时间。这是天真的回答。在现代的IA32上,这一切都取决于数据来自哪里。如果它在L1缓存中,只要数据来自单个缓存行,那么负载就可以忽略不计。如果它跨越多个缓存线,就会有一个小开销。如果它来自L2,它需要很长一段时间,如果它在RAM中,那么它会更长,最后,如果它在磁盘上,这是很长的一段时间。因此,浮点数或双精度数的选择比使用数据的方式更重要。如果你想在大量的顺序数据上做一个小的计算,一个小的数据类型是更好的。在一个小的数据集上做大量的计算可以让您使用更大的数据类型,并产生任何重要的影响。如果您非常随机地访问数据,那么数据大小的选择并不重要——数据是在页面/缓存行中加载的。因此,即使您只想从RAM中获得一个字节,也可以获得32字节的传输(这非常依赖于系统的体系结构)。最重要的是,CPU/FPU可以是超级标量(也就是流水线)。因此,即使一个负载可能要花费几个周期,CPU/FPU也可能忙于做一些其他的事情(例如乘),在一定程度上隐藏负载时间

#3


10  

Short answer is: it depends.

简而言之:这要看情况。

CPU with x87 will crunch floats and doubles equally fast. Vectorized code will run faster with floats, because SSE can crunch 4 floats or 2 doubles in one pass.

使用x87的CPU将处理浮点数,并以同样的速度加倍。矢量化代码在浮点数上运行得更快,因为SSE可以一次处理4个浮点数或2个双精度浮点数。

Another thing to consider is memory speed. Depending on your algorithm, your CPU could be idling a lot while waiting for the data. Memory intensive code will benefit from using floats, but ALU limited code won't (unless it is vectorized).

另一个需要考虑的是内存速度。根据算法的不同,在等待数据时,CPU可能会大量闲置。内存密集型的代码将从使用浮点数中获益,但ALU有限的代码不会受益(除非它是向量化的)。

#4


3  

On Intel, the coprocessor (nowadays integrated) will handle both equally fast, but as some others have noted, doubles result in higher memory bandwidth which can cause bottlenecks. If you're using scalar SSE instructions (default for most compilers on 64-bit), the same applies. So generally, unless you're working on a large set of data, it doesn't matter much.

在英特尔上,协处理器(现在的集成处理器)将同时处理这两种速度,但是正如其他人所注意到的,双处理器将导致更高的内存带宽,从而导致瓶颈。如果您正在使用标量SSE指令(对于64位上的大多数编译器来说是默认的),那么同样的情况也会发生。所以,通常情况下,除非你正在处理大量的数据,否则这并不重要。

However, parallel SSE instructions will allow four floats to be handled in one instruction, but only two doubles, so here float can be significantly faster.

然而,并行SSE指令将允许在一条指令中处理4个浮点数,但是只有两个双精度浮点数,因此这里的浮点数可以显著地更快。

#5


3  

I can think of two basic cases when doubles are faster than floats:

我能想到两种基本情况,当双精度比浮点数快时:

  1. Your hardware supports double operations but not float operations, so floats will be emulated by software and therefore be slower.

    您的硬件支持双操作,但不支持浮点操作,因此浮动将被软件模拟,因此速度会更慢。

  2. You really need the precision of doubles. Now, if you use floats anyway you will have to use two floats to reach similar precision to double. The emulation of a true double with floats will be slower than using floats in the first place.

    你真的需要双精度。现在,如果你使用浮点数,你将不得不使用两个浮点数来达到相同的精度。与浮点数相比,使用浮点数来模拟真实的双精度浮点数要慢一些。

    1. You do not necessarily need doubles but your numeric algorithm converges faster due to the enhanced precision of doubles. Also, doubles might offer enough precision to use a faster but numerically less stable algorithm at all.
    2. 您不一定需要双精度,但是由于双精度的提高,您的数字算法收敛得更快。此外,双打可能提供足够的精度来使用速度更快但在数值上更不稳定的算法。

For completeness' sake I also give some reasons for the opposite case of floats being faster. You can see for yourself whichs reasons dominate in your case:

为了完整起见,我也给出了一些相反情况下浮动更快的原因。你可以亲眼看到,在你的案例中,是什么原因主导着你:

  1. Floats are faster than doubles when you don't need double's precision and you are memory-bandwidth bound and your hardware doesn't carry a penalty on floats.

    当不需要双精度时,浮点数比双精度快,并且内存带宽受限,并且硬件不会对浮点数造成影响。

  2. They conserve memory-bandwidth because they occupy half the space per number.

    它们节省内存带宽,因为它们占用每个数字一半的空间。

  3. There are also platforms that can process more floats than doubles in parallel.

    还有一些平台可以同时处理更多的浮点数而不是双倍的浮点数。

#6


2  

There is only one reason 32-bit floats can be slower than 64-bit doubles (or 80-bit 80x87). And that is alignment. Other than that, floats take less memory, generally meaning faster access, better cache performance. It also takes fewer cycles to process 32-bit instructions. And even when (co)-processor has no 32-bit instructions, it can perform them on 64-bit registers with the same speed. It probably possible to create a test case where doubles will be faster than floats, and v.v., but my measurements of real statistics algos didn't show noticeable difference.

32位浮点数比64位双精度浮点数(或80位80x87)慢的原因只有一个。对齐。除此之外,浮动占用的内存更少,通常意味着更快的访问速度和更好的缓存性能。处理32位指令也需要更少的周期。而且,即使(co)处理器没有32位指令,它也可以以相同的速度在64位寄存器上执行这些指令。有可能创建一个测试用例,其中double要比float和v.v快。但是,我对真实统计数据的测量并没有显示出明显的差异。

#7


2  

In experiments of adding 3.3 for 2000000000 times, results are:

在2000000000次添加3.3的实验中,结果如下:

Summation time in s: 2.82 summed value: 6.71089e+07 // float
Summation time in s: 2.78585 summed value: 6.6e+09 // double
Summation time in s: 2.76812 summed value: 6.6e+09 // long double

So double is faster and default in C and C++. It's more portable and the default across all C and C++ library functions. Alos double has significantly higher precision than float.

在C和c++中,double是更快的,默认的。它具有更强的可移植性,在所有C和c++库函数中都是默认的。Alos double比float有更高的精度。

Even Stroustrup recommends double over float:

即使是Stroustrup也建议双倍于浮点数:

"The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don't have that understanding, get advice, take the time to learn, or use double and hope for the best."

单精度、双精度和扩展精度的确切含义是由实现定义的。在选择重要的问题上选择正确的精度需要对浮点计算有深刻的理解。如果你没有那样的理解,那就去寻求建议,花时间去学习,或者用双倍的时间去期待最好的结果。

Perhaps the only case where you should use float instead of double is on 64bit hardware with a modern gcc. Because float is smaller; double is 8 bytes and float is 4 bytes.

也许您应该使用浮点数而不是双精度数的唯一情况是在64位硬件上使用现代gcc。因为浮动较小;double是8字节,float是4字节。

#8


1  

float is usually faster. double offers greater precision. However performance may vary in some cases if special processor extensions such as 3dNow or SSE are used.

浮动通常更快。双提供更高的精度。但是,如果使用特殊的处理器扩展(如3dNow或SSE),性能可能会有所不同。