原始的浮动和双重支持有多少位小数?(复制)

时间:2022-05-10 11:17:56

This question already has an answer here:

这个问题已经有了答案:

I have read that double stores 15 digits and float stores 7 digits.

我已经读到,双存储15位数字和浮动存储7位数字。

My question is, are these numbers the number of decimal places supported or total number of digits in a number?

我的问题是,这些数字是支持的小数位数还是数字的总数?

4 个解决方案

#1


3  

Those are the total number of "significant figures" if you will, counting from left to right, regardless of where the decimal point is. Beyond those numbers of digits, accuracy is not preserved.

这些是“重要人物”的总数,从左到右,不论小数点在哪里。除了这些数字之外,准确性并没有被保留。

The counts you listed are for the base 10 representation.

你列出的数是为基数10表示。

#2


2  

There are macros for the number of decimal places each type supports. The gcc docs explain what they are and also what they mean:

每个类型支持的小数位数都有宏。gcc的文档解释了它们的含义,以及它们的含义:

FLT_DIG

FLT_DIG

This is the number of decimal digits of precision for the float data type. Technically, if p and b are the precision and base (respectively) for the representation, then the decimal precision q is the maximum number of decimal digits such that any floating point number with q base 10 digits can be rounded to a floating point number with p base b digits and back again, without change to the q decimal digits.

这是浮点数据类型的精确小数位数。从技术上讲,如果p和b是精度和基础(分别)表示,然后小数精度q是小数位数的最大数量,这样任何浮点数问基地10位数可以绕过一个浮点数与p基b数字和回来,没有改变q小数位数。

The value of this macro is supposed to be at least 6, to satisfy ISO C.

这个宏的值应该至少是6,以满足ISO C的要求。

DBL_DIG
LDBL_DIG

DBL_DIG LDBL_DIG

These are similar to FLT_DIG, but for the data types double and long double, respectively. The values of these macros are supposed to be at least 10.

它们类似于FLT_DIG,但对于数据类型,它们分别是double和long double。这些宏的值应该至少是10。

On both gcc 4.9.2 and clang 3.5.0, these macros yield 6 and 15, respectively.

在gcc 4.9.2和clang 3.5.0中,这些宏分别为6和15。

#3


1  

are these numbers the number of decimal places supported or total number of digits in a number?

这些数字是数字中支持的数字还是数字的总数?

They are the significant digits contained in every number (although you may not need all of them, but they're still there). The mantissa of the same type always contains the same number of bits, so every number consequentially contains the same number of valid "digits" if you think in terms of decimal digits. You cannot store more digits than will fit into the mantissa.

它们是包含在每个数字中的有效数字(尽管您可能不需要所有的数字,但它们仍然存在)。同一类型的尾数总是包含相同数量的位元,因此,如果以十进制数字来计算,每个数字必然包含相同数目的有效“数字”。你不能存储更多的数字,而不适合放在尾数。

The number of "supported" digits is, however, much larger, for example float will usually support up to 38 decimal digits and double will support up to 308 decimal digits, but most of these digits are not significant (that is, "unknown").

但是,“受支持的”数字的数量要大得多,例如,浮点数通常支持38位小数,double将支持308位小数,但大多数数字都不重要(即“未知”)。

Although technically, this is wrong, since float and double do not have universally well-defined sizes like I presumed above (they're implementation-defined). Also, storage sizes are not necessarily the same as the sizes of intermediate results.

尽管从技术上讲,这是错误的,因为float和double没有像我假设的那样具有普遍定义的大小(它们是实现定义的)。此外,存储大小不一定与中间结果的大小相同。

The C++ standard is very reluctant at precisely defining any fundamental type, leaving almost everything to the implementation. Floating point types are no exception:

c++标准非常不愿意精确定义任何基本类型,几乎所有的东西都留给了实现。浮点类型也不例外:

3.9.1 / 8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.

3.9.1 / 8有三种浮点类型:float、double和long double。type double提供的精度至少和float一样多,而且类型long double提供的精度至少是double的精度。类型float的值集是类型double的值集的子集;类型double的值集是类型long double的值集的子集。浮点类型的值表示是实现定义的。

Now of course all of this is not particularly helpful in practice.

当然,所有这些在实践中都不是特别有用。

In practice, floating point is (usually) IEEE 754 compliant, with float having a width of 32 bits and double having a width of 64 bits (as stored in memory, registers have higher precision on some notable mainstream architectures).

在实践中,浮点数(通常)是IEEE 754兼容的,浮动的宽度为32位,双的宽度为64位(存储在内存中,寄存器在一些著名的主流架构上有更高的精度)。

This is equivalent to 24 bits and 53 bits of matissa, respectively, or 7 and 15 full decimals.

这相当于24位和53位的matissa,分别是7和15个全小数。

#4


0  

If you are on an architecture using IEEE-754 floating point arithmetic (as in most architectures), then the type float corresponds to single precision, and the type double corresponds to double precision, as described in the standard.

如果使用IEEE-754浮点算法(在大多数体系结构中是这样),那么类型浮动对应于单个精度,而类型double对应于标准的双精度。

Let's make some numbers:

让我们做一些数字:

Single precision:

32 bits to represent the number, out of which 24 bits are for mantissa. This means that the least significant bit (LSB) has a relative value of 2^(-24) respect to the MSB, which is the "hidden 1", and it is not represented. Therefore, for a fixed exponent, the minimum representable value is 10^(-7.22) times the exponent. What this means is that for a representation in base exponent notation (3.141592653589 E 25), only "7.22" decimal numbers are significant, which in practice means that at least 7 decimals will be always correct.

32位表示数字,其中24位表示尾数。这意味着最低有效位(LSB)的相对价值2 ^(-24)对MSB)即“隐藏1”,它并不代表。因此,对于一个固定的指数,最低可表示的值是10 ^指数(-7.22)倍。这意味着,在基础指数表示法(3.141592653589 e25)中,只有“7.22”十进制数是显著的,这在实践中意味着至少7个小数总是正确的。

Double precision:

64 bits to represent the number, out of which 53 bits are for mantissa. Following the same reasoning, expressing 2^(-53) as a power of 10 results in 10^(-15.95), which in term means that at least 15 decimals will be always correct.

64位表示数字,其中53位表示尾数。遵循同样的推理,表达2 ^(-53)10个结果的力量在10 ^(-15.95),在术语意味着至少15小数将总是正确的。

#1


3  

Those are the total number of "significant figures" if you will, counting from left to right, regardless of where the decimal point is. Beyond those numbers of digits, accuracy is not preserved.

这些是“重要人物”的总数,从左到右,不论小数点在哪里。除了这些数字之外,准确性并没有被保留。

The counts you listed are for the base 10 representation.

你列出的数是为基数10表示。

#2


2  

There are macros for the number of decimal places each type supports. The gcc docs explain what they are and also what they mean:

每个类型支持的小数位数都有宏。gcc的文档解释了它们的含义,以及它们的含义:

FLT_DIG

FLT_DIG

This is the number of decimal digits of precision for the float data type. Technically, if p and b are the precision and base (respectively) for the representation, then the decimal precision q is the maximum number of decimal digits such that any floating point number with q base 10 digits can be rounded to a floating point number with p base b digits and back again, without change to the q decimal digits.

这是浮点数据类型的精确小数位数。从技术上讲,如果p和b是精度和基础(分别)表示,然后小数精度q是小数位数的最大数量,这样任何浮点数问基地10位数可以绕过一个浮点数与p基b数字和回来,没有改变q小数位数。

The value of this macro is supposed to be at least 6, to satisfy ISO C.

这个宏的值应该至少是6,以满足ISO C的要求。

DBL_DIG
LDBL_DIG

DBL_DIG LDBL_DIG

These are similar to FLT_DIG, but for the data types double and long double, respectively. The values of these macros are supposed to be at least 10.

它们类似于FLT_DIG,但对于数据类型,它们分别是double和long double。这些宏的值应该至少是10。

On both gcc 4.9.2 and clang 3.5.0, these macros yield 6 and 15, respectively.

在gcc 4.9.2和clang 3.5.0中,这些宏分别为6和15。

#3


1  

are these numbers the number of decimal places supported or total number of digits in a number?

这些数字是数字中支持的数字还是数字的总数?

They are the significant digits contained in every number (although you may not need all of them, but they're still there). The mantissa of the same type always contains the same number of bits, so every number consequentially contains the same number of valid "digits" if you think in terms of decimal digits. You cannot store more digits than will fit into the mantissa.

它们是包含在每个数字中的有效数字(尽管您可能不需要所有的数字,但它们仍然存在)。同一类型的尾数总是包含相同数量的位元,因此,如果以十进制数字来计算,每个数字必然包含相同数目的有效“数字”。你不能存储更多的数字,而不适合放在尾数。

The number of "supported" digits is, however, much larger, for example float will usually support up to 38 decimal digits and double will support up to 308 decimal digits, but most of these digits are not significant (that is, "unknown").

但是,“受支持的”数字的数量要大得多,例如,浮点数通常支持38位小数,double将支持308位小数,但大多数数字都不重要(即“未知”)。

Although technically, this is wrong, since float and double do not have universally well-defined sizes like I presumed above (they're implementation-defined). Also, storage sizes are not necessarily the same as the sizes of intermediate results.

尽管从技术上讲,这是错误的,因为float和double没有像我假设的那样具有普遍定义的大小(它们是实现定义的)。此外,存储大小不一定与中间结果的大小相同。

The C++ standard is very reluctant at precisely defining any fundamental type, leaving almost everything to the implementation. Floating point types are no exception:

c++标准非常不愿意精确定义任何基本类型,几乎所有的东西都留给了实现。浮点类型也不例外:

3.9.1 / 8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.

3.9.1 / 8有三种浮点类型:float、double和long double。type double提供的精度至少和float一样多,而且类型long double提供的精度至少是double的精度。类型float的值集是类型double的值集的子集;类型double的值集是类型long double的值集的子集。浮点类型的值表示是实现定义的。

Now of course all of this is not particularly helpful in practice.

当然,所有这些在实践中都不是特别有用。

In practice, floating point is (usually) IEEE 754 compliant, with float having a width of 32 bits and double having a width of 64 bits (as stored in memory, registers have higher precision on some notable mainstream architectures).

在实践中,浮点数(通常)是IEEE 754兼容的,浮动的宽度为32位,双的宽度为64位(存储在内存中,寄存器在一些著名的主流架构上有更高的精度)。

This is equivalent to 24 bits and 53 bits of matissa, respectively, or 7 and 15 full decimals.

这相当于24位和53位的matissa,分别是7和15个全小数。

#4


0  

If you are on an architecture using IEEE-754 floating point arithmetic (as in most architectures), then the type float corresponds to single precision, and the type double corresponds to double precision, as described in the standard.

如果使用IEEE-754浮点算法(在大多数体系结构中是这样),那么类型浮动对应于单个精度,而类型double对应于标准的双精度。

Let's make some numbers:

让我们做一些数字:

Single precision:

32 bits to represent the number, out of which 24 bits are for mantissa. This means that the least significant bit (LSB) has a relative value of 2^(-24) respect to the MSB, which is the "hidden 1", and it is not represented. Therefore, for a fixed exponent, the minimum representable value is 10^(-7.22) times the exponent. What this means is that for a representation in base exponent notation (3.141592653589 E 25), only "7.22" decimal numbers are significant, which in practice means that at least 7 decimals will be always correct.

32位表示数字,其中24位表示尾数。这意味着最低有效位(LSB)的相对价值2 ^(-24)对MSB)即“隐藏1”,它并不代表。因此,对于一个固定的指数,最低可表示的值是10 ^指数(-7.22)倍。这意味着,在基础指数表示法(3.141592653589 e25)中,只有“7.22”十进制数是显著的,这在实践中意味着至少7个小数总是正确的。

Double precision:

64 bits to represent the number, out of which 53 bits are for mantissa. Following the same reasoning, expressing 2^(-53) as a power of 10 results in 10^(-15.95), which in term means that at least 15 decimals will be always correct.

64位表示数字,其中53位表示尾数。遵循同样的推理,表达2 ^(-53)10个结果的力量在10 ^(-15.95),在术语意味着至少15小数将总是正确的。