在SQL数据库中存储权重的最佳实践?

时间:2021-12-29 16:58:04

An application I'm working on needs to store weights of the format X pounds, y.y ounces. The database is MySQL, but I imagine this is DB agnostic.

我正在处理的应用程序需要存储X磅,y.y盎司格式的权重。数据库是MySQL,但我想这是数据库不可知的。

I can think of three ways to do this:

我可以想到三种方法:

  1. Convert the weight to decimal pounds and store in a single field. (5 lbs 6.2 oz = 5.33671875 lbs)
  2. 将权重转换为十进制磅并存储在单个字段中。 (5磅6.2盎司= 5.33671875磅)
  3. Convert the weight to decimal ounces and store in a single field. (5 lbs 6.2 oz = 86.2 oz)
  4. 将重量转换为十进制盎司并存储在单个字段中。 (5磅6.2盎司= 86.2盎司)
  5. Store the pounds portion as an integer and the ounces portion as a decimal, in two fields.
  6. 在两个字段中将磅部分存储为整数,将盎司部分存储为小数。

I'm thinking that #1 is not such a good idea, since decimal pounds will produce numbers of arbitrary precision, which would need to be stored as a float, which could lead to inaccuracies which are inherent in floating point numbers.

我认为#1不是一个好主意,因为十进制磅将产生任意精度的数字,这需要存储为浮点数,这可能导致浮点数固有的不准确性。

Is there a compelling reason to choose #2 over #3 or vise-versa?

是否有令人信服的理由选择#2而非#3或反之亦然?

3 个解决方案

#1


27  

TL;DR

Choose either option #1 or option #2—there's no difference between them. Don't use option #3, because it's awkward to work with.

选择选项#1或选项#2 - 它们之间没有区别。不要使用选项#3,因为使用它很尴尬。

You claim that there are inherent inaccuracies in floating point numbers. I think that this deserves a little explanation.

您声称浮点数存在固有的不准确性。我认为这值得一点解释。

When deciding upon a numeral system for representing a number (whether on a piece of paper, in a computer circuit, or elsewhere), there are two separate issues to consider:

在决定用于表示数字的数字系统时(无论是在纸上,在计算机电路中还是在其他地方),需要考虑两个单独的问题:

  1. its basis; and

    它的基础;和

  2. its format.

    它的格式。

Pick a base, any base…

Limited by finite space, one cannot represent an arbitrary member of an infinite set. For example: no matter how much paper you buy or how small your handwriting, it'd always be possible to find an integer that won't fit in the given space (you could just keep appending extra digits until the paper runs out). So, with integers, we usually restrict our finite space to representing only those that fall within some particular interval—e.g. if we have space for three digits, we might restrict ourselves to the interval [-999,999].

受有限空间的限制,不能代表无限集的任意成员。例如:无论您购买多少纸张或手写多少,总是有可能找到一个不适合给定空间的整数(您可以在纸张用完之前保持附加额外的数字)。因此,对于整数,我们通常将有限空间限制为仅代表属于某个特定区间的那些 - 例如。如果我们有三位数的空间,我们可能会将自己限制在[-999,999]区间。

Every non-empty interval contains an infinite set of real numbers. In other words, no matter what interval one takes over the real numbers—be it [-999,999], [0,1], [0.000001,0.000002] or anything else—there is still an infinite set of reals within that interval! Therefore arbitrary real numbers must always be "rounded" to something that can be represented in finite space.

每个非空间隔包含一组无穷大的实数。换句话说,无论一个区间接管实数 - 无论是[-999,999],[0,1],[0.000001,0.000002]还是其他任何东西 - 在该区间内仍有无限的实数组!因此,任意实数必须始终“舍入”到可以在有限空间中表示的东西。

The set of real numbers that can be represented in finite space depends upon the numeral system that is used. In our (familiar) positional base-10 system, finite space will suffice for one-half (0.510) but not for one-third (0.33333…10); by contrast, in the (less familiar) positional base-9 system, it is the other way around (those same numbers are respectively 0.44444…9 and 0.39). Irrational numbers always require infinite space in standard positional systems. The consequence of all this is that some numbers that can be represented using only a small amount of space in positional base-10 (and therefore appear to be very "round" to us humans) would actually require infinite binary circuits for storage (and therefore don't appear to be very "round" to our digital friends)!

可以在有限空间中表示的实数集取决于所使用的数字系统。在我们(熟悉的)位置基10系统中,有限空间足够一半(0.510)但不足三分之一(0.33333 ... 10);相比之下,在(不太熟悉的)位置基础9系统中,它是另一种方式(那些相同的数字分别为0.44444 ... 9和0.39)。无理数在标准位置系统中总是需要无限空间。所有这一切的结果是,一些数字只能在位置基数10中使用少量空间来表示(因此看起来对我们人类来说非常“圆”)实际上需要无限的二进制电路来存储(因此对我们的数字朋友来说似乎并非“圆润”)!

We can't do any better for continuous quantities. Ultimately such quantities must use a finite representation in some numeral system: it's arbitrary whether that system happens to be easy on computer circuits, on human fingers, on something else or on nothing at all—whichever system is used, the value must be rounded and therefore it always results in "representation error".

对于连续数量,我们不能做得更好。最终,这样的数量必须在某个数字系统中使用有限的表示:无论系统在计算机电路,人类手指,其他东西上是否容易,或者根本没有任何系统,它是任意的,无论使用哪个系统,该值必须是四舍五入的。因此它总是导致“表示错误”。

In other words, even if one has a perfectly precise measuring instrument (which is physically impossible), then any measurement it reports will already have been rounded to a number that happens to fit on its display (in whatever base it uses—typically decimal, for obvious reasons). So, "86.2 oz" is never actually "86.2 oz" but rather a representation of "something between 86.1500000... oz and 86.2499999... oz". (Actually, because in reality the instrument is imperfect, all we can ever really say is that we have some degree of confidence that the actual value falls within that interval—but that is definitely departing some way from the point here).

换句话说,即使一个人拥有一个完全精确的测量仪器(在物理上是不可能的),那么它报告的任何测量都已经四舍五入到恰好适合其显示器的数字(无论它使用什么基数 - 通常为十进制,显而易见的原因)。因此,“86.2盎司”实际上从来不是“86.2盎司”,而是“86.1500000 ...盎司和86.2499999 ......盎司之间的东西”的表示。 (实际上,因为实际上仪器是不完美的,我们所能真正说的是我们对实际值落在该区间内有一定程度的信心 - 但这肯定是偏离了这里的某种方式)。

But we can do better for discrete quantities. Such values are not "arbitrary real numbers" and therefore none of the above applies to them: they can be represented exactly in the numeral system in which they were defined—and indeed, should be (as converting to another numeral system and truncating to a finite length would result in rounding to an inexact number). Computers can (inefficiently) handle such situations by representing the number as a string: e.g. consider ASCII or BCD encoding.

但我们可以为离散量做得更好。这些值不是“任意实数”,因此以上都不适用于它们:它们可以精确地表示在它们被定义的数字系统中 - 实际上应该是(转换为另一个数字系统并截断为有限长度会导致四舍五入到不精确的数字)。计算机可以(低效率地)通过将数字表示为字符串来处理这种情况:例如考虑ASCII或BCD编码。

Apply a format…

Since it's a property of the numeral system's (somewhat arbitrary) basis, whether or not a value appears to be "round" has no bearing on its precision. That's a really important observation, which runs counter to many people's intuition (and it's the reason I spent so much time explaining numerical basis above).

由于它是数字系统(有些任意)基础的属性,因此值是否为“圆形”与其精度无关。这是一个非常重要的观察,与许多人的直觉背道而驰(这就是我花了这么多时间解释数字基础的原因)。

Precision is instead determined by how many significant figures a representation has. We need a storage format that is capable of recording our values to at least as many significant figures as we consider them to be correct. Taking by way of example values that we consider to be correct when stated as 86.2 and 0.0000862, the two most common options are:

精确度取决于表示有多少重要数字。我们需要一种存储格式,能够将我们的值记录到至少与我们认为正确无关的重要数字。以86.2和0.0000862表示我们认为正确的值为例,两个最常见的选项是:

  • Fixed point, where the number of significant figures depends on magnitude: e.g. in fixed 5-decimal-point representation, our values would be stored as 86.20000 and 0.00009 (and therefore have 7 and 1 significant figures of precision respectively). In this example, precision has been lost in the latter value (and indeed, it wouldn't take much more for us to have been totally unable to represent anything of significance); and the former value stored false precision, which is a waste of our finite space (and indeed, it wouldn't take much more for the value to become so large that it overflows the storage capacity).

    固定点,其中有效数字的数量取决于幅度:例如在固定的5位小数表示中,我们的值将存储为86.20000和0.00009(因此分别具有7和1个有效精度数字)。在这个例子中,精确度已经在后一个值中丢失了(实际上,我们完全无法代表任何重要的东西,这不会花费太多时间);并且前一个值存储了假精度,这是对我们有限空间的浪费(实际上,它不会花费更多的时间来使该值变得如此之大以至于它会溢出存储容量)。

    A common example of when this format might be appropriate is for an accounting system: currency must usually be tracked to the penny irrespective of the monetary sum (therefore less precision is required for small values, but greater precision is required for large values). As it happens, currency is usually also considered to be discrete (pennies are indivisible), so this is also a good example of a situation where a particular basis (decimal for most modern currencies) is desirable to avoid the representation errors discussed above.

    这种格式适用的一个常见例子是会计系统:货币通常必须跟踪到一分钱而不管货币总和(因此小值需要较低的精度,但大值需要更高的精度)。实际上,货币通常也被认为是离散的(便士是不可分割的),所以这也是一个很好的例子,其中特定基础(大多数现代货币的十进制)是理想的,以避免上面讨论的表示错误。

    One usually implements fixed point storage by treating one's values as quotients over a common denominator and storing the numerator as an integer. In our example, the common denominator could be 105, so instead of 86.20000 and 0.00009 one would store the integers 8620000 and 9 and remember that they must be divided by 100000.

    人们通常通过将一个值作为公义分母上的商并将分子存储为整数来实现定点存储。在我们的例子中,公分母可以是105,所以不是86.20000和0.00009,而是存储整数8620000和9,并记住它们必须除以100000。

  • Floating point, where the number of significant figures is constant irrespective of magnitude: e.g. in 5-significant-figure decimal representation, our values would be stored as 86.200 and 0.000086200 (and, by definition, have 5 significant figures of precision both times). In this example, both values have been stored without any loss of precision; and they both also have the same amount of false precision, which is less wasteful (and we can therefore use our finite space to represent a far greater range of values—both large and small).

    浮点,其中有效数字的数量是恒定的,与幅度无关:例如,在5位有效数字十进制表示中,我们的值将存储为86.200和0.000086200(根据定义,两次都有5位有效精度数字)。在这个例子中,两个值都已存储而没有任何精度损失;并且它们都具有相同数量的错误精度,这样可以减少浪费(因此我们可以使用有限空间来表示更大范围的值 - 无论大小)。

    A common example of when this format might be appropriate is for recording any real world measurements: the precision of measuring instruments (which all suffer from both systematic and random errors) is fairly constant irrespective of scale so, given sufficient significant figures (typically around 3 or 4 digits), absolutely no precision is lost even if a change of base resulted in rounding to a different number.

    这种格式适用的一个常见例子是记录任何真实世界的测量结果:测量仪器的精度(均受系统误差和随机误差影响)是相当恒定的,无论尺度如何,给定足够的有效数字(通常约为3)或者4位数),即使基数的变化导致舍入到不同的数字,也绝对不会丢失精度。

    One usually implements floating point storage by treating one's values as integer significands with integer exponents. In our example, the significand could be 86200 for both values whereupon the (base-10) exponent would be -4 and -9 respectively.

    一个人通常通过将一个值作为带有整数指数的整数有效数来处理浮点存储。在我们的例子中,两个值的有效数可能是86200,因此(base-10)指数分别为-4和-9。

    But how precise are the floating point storage formats used by our computers?

    但是我们的计算机使用的浮点存储格式有多精确?

    • An IEEE754 single precision (binary32) floating point number has 24 bits, or log10(224) (over 7) digits, of significance—i.e. it has a tolerance of less than ±0.000006%. In other words, it is more precise than saying "86.20000".

      IEEE754单精度(binary32)浮点数具有24位,或log10(224)(超过7)个数字,具有重要意义,即。它的公差小于±0.000006%。换句话说,它比说“86.20000”更精确。

    • An IEEE754 double precision (binary64) floating point number has 53 bits, or log10(253) (almost 16) digits, of significance—i.e. it has a tolerance of just over ±0.00000000000001%. In other words, it is more precise than saying "86.2000000000000".

      IEEE754双精度(二进制64)浮点数具有53位,或log10(253)(几乎16)位,具有重要意义,即。它的公差刚好超过±0.00000000000001%。换句话说,它比说“86.2000000000000”更精确。

    The most important thing to realise is that these formats are, respectively, over ten thousand and over one trillion times more precise than saying "86.2"—even though their representations in binary happen to round to numbers that appear less "exact" in decimal (more on this shortly)!

    最重要的是要知道这些格式分别超过一万次,比说“86.2”精确度超过一万亿次 - 尽管它们在二进制中的表示恰好围绕着十进制中看起来不那么“精确”的数字(更多关于这一点)!

Notice also that both fixed and floating point formats will result in loss of precision when a value is known more precisely than the format supports. Such rounding errors can propagate in arithmetic operations to yield apparently erroneous results (which no doubt explains your reference to the "inherent inaccuracies" of floating point numbers): for example, 13 × 3000 in 5-place fixed point would yield 999.99000 rather than 1000.00000; and 1081325 in 5-significant figure floating point would yield 0.0034600 rather than 0.0034568.

另请注意,当比已知格式支持的值更精确地知道值时,固定和浮点格式都会导致精度损失。这种舍入误差可以在算术运算中传播,产生明显错误的结果(这无疑解释了你对浮点数的“固有不准确性”的引用):例如,5位固定点的1/3×3000将产生999.99000而不是超过1000.00000;和10/81 - 3/25的5有效数字浮点数将产生0.0034600而不是0.0034568。

The field of numerical analysis is dedicated to understanding these effects, but it is important to realise that any usable system (even performing calculations in your head) is vulnerable to such problems because no method of calculation that is guaranteed to terminate can ever offer infinite precision: consider, for example, how to calculate the area of a circle—there will necessarily be loss of precision in the value used for π, which will propagate into the result.

数值分析领域致力于理解这些影响,但重要的是要认识到任何可用的系统(甚至在你脑中进行计算)都容易受到这些问题的影响,因为没有任何保证终止的计算方法可以提供无限的精度。例如,考虑如何计算圆的面积 - 用于π的值必然会丢失精度,这将传播到结果中。

Conclusion

  1. Real world measurements should use binary floating point: it's fast, compact, extremely precise and no worse than anything else (including the decimal version from which you started). Since MySQL's floating-point datatypes are IEEE754, this is exactly what they offer.

    真实世界的测量应该使用二进制浮点:它快速,紧凑,非常精确,并且不比其他任何东西差(包括你开始的十进制版本)。由于MySQL的浮点数据类型是IEEE754,这正是它们提供的。

  2. Currency applications should use denary fixed point: whilst it's slow and wastes memory, it ensures both that values are not rounded to inexact quantities and that pennies are not lost on large monetary sums. Since MySQL's fixed-point datatypes are BCD-encoded strings, this is exactly what they offer.

    货币应用程序应该使用否定的固定点:虽然它很慢并浪费内存,但它确保这两个值都不会四舍五入到不精确的数量,并且这些便士不会在大额货币金额上丢失。由于MySQL的定点数据类型是BCD编码的字符串,因此这正是它们提供的。

Finally, bear in mind that most programming languages represent fractional values using binary floating-point types: so even if your database stores values in another format, they'll probably get converted (with all the ensuing issues that entails) at the interface with your application code.

最后,请记住,大多数编程语言使用二进制浮点类型表示小数值:因此,即使您的数据库以另一种格式存储值,它们也可能会在与您的接口处转换(包含所有随之而来的问题)应用代码。

Which option is best in this case?

Hopefully I've convinced you that your values can safely (and should) be stored in floating point types without worrying about any "inaccuracies"? Remember, they're more precise than your flimsy 3-significant-digit decimal representation ever was: you just have to ignore false precision (but one must always do that anyway, even if using a fixed-point decimal format).

希望我已经说服你,你的值可以安全地(并且应该)存储在浮点类型中而不用担心任何“不准确”?记住,它们比你脆弱的3位有效数字十进制表示更精确:你只需要忽略错误的精度(但是无论如何,即使使用定点十进制格式,也必须始终这样做)。

As for your question: choose either option 1 or 2 over option 3—it makes comparisons easier (for example, to find the maximum mass, one could just use MAX(mass), whereas to do it efficiently across two columns would require some nesting).

至于你的问题:在选项3中选择选项1或2 - 它使比较更容易(例如,要找到最大质量,可以只使用MAX(质量),而要在两列中有效地进行比较则需要一些嵌套)。

Generally speaking, between those two options it wouldn't much matter which one chooses—floating point numbers are stored with a constant number of significant bits irrespective of their scale (indeed, it could be that some values are rounded to numbers that are closer to their original decimal representation using option 1 whilst simultaneously others are rounded to numbers that are closer to their original decimal representation using option 2: it simply depends how well each particular value can be represented in binary).

一般来说,在这两个选项之间,选择哪一个并不重要 - 浮点数存储有恒定数量的有效位而不管它们的规模如何(实际上,可能是某些值四舍五入到更接近的数字)使用选项1的原始十进制表示,同时使用选项2将其他四舍五入到更接近其原始十进制表示的数字:它仅取决于每个特定值在二进制中的表示能力。

In this case, because it happens that there are 16 ounces to 1 pound (and 16 is a power of 2), the relative difference between the original decimal values and the numbers stored using the two approaches is identical:

在这种情况下,因为它有16盎司到1磅(16是2的幂),原始十进制值和使用这两种方法存储的数字之间的相对差异是相同的:

  1. 5.387510 (not 5.3367187510 as stated in your question) would be stored in a binary32 float as 101.0110001100110011001102 (which is 5.3874998092651367187510): this is 0.0000036% from the original value (but, as discussed above, the "original value" was already a pretty lousy representation of the physical quantity it represents).

    5.387510(不是你问题中描述的5.3367187510)将存储在二进制32浮点数101.0110001100110011001102(这是5.3874998092651367187510):这是原始值的0.0000036%(但是,如上所述,“原始值”已经很糟糕了表示它所代表的物理量)。

    Knowing that a binary32 float stores only 7 decimal digits of precision, our compiler knows for certain that everything from the 8th digit onwards is definitely false precision and therefore must be ignored in every case—thus, provided that our input value didn't require more precision than that (and if it did, binary32 was obviously the wrong choice of format), this guarantees a return to a decimal value that looks just as round as that from which we started: 5.38750010. However, we should really apply domain knowledge at this point (as we should with any storage format) to discard any further false precision that might exist, such as those two trailing zeroes.

    知道binary32 float只存储精度的7位十进制数,我们的编译器肯定知道从第8位开始的所有内容都是假精度,因此必须在每种情况下都被忽略 - 因此,只要我们的输入值不需要更多精度比那个(如果确实如此,binary32显然是错误的格式选择),这保证了返回一个十进制值,看起来像我们开始的那样圆:5.38750010。但是,我们应该在这一点上真正应用领域知识(我们应该使用任何存储格式)来丢弃可能存在的任何进一步的错误精度,例如那两个尾随零。

  2. 86.210 would be stored in a binary32 float as 1010110.001100110011001102 (which is 86.199996948242187510): this is also 0.0000036% from the original value. As before, we then ignore false precision.

    86.210将存储在binary32 float中,如1010110.001100110011001102(即86.199996948242187510):这也是原始值的0.0000036%。和以前一样,我们忽略了错误的精度。

Notice how the binary representations of the numbers are identical, except for the placement of the radix point (which is four bits apart):

注意数字的二进制表示是如何相同的,除了小数点的位置(相隔四位):

101.0110 00110011001100110
101 0110.00110011001100110

This is because 5.3875 × 24 = 86.2.

这是因为5.3875×24 = 86.2。

As an aside: being European (albeit British), I also have a strong aversion to imperial units of measurement—handling values of different scales is just so messy. I'd almost certainly store masses in SI units (e.g. kilograms or grams) and then perform conversions to imperial units as required within the presentation layer of my application. Plus rigidly adhering to SI units might one day save you from losing $125m.

除此之外:作为欧洲人(虽然是英国人),我也对帝国的衡量单位有强烈的厌恶 - 不同规模的处理价值就是如此混乱。我几乎肯定会以SI单位(例如千克或克)存储质量,然后根据我的应用程序的表示层中的要求执行转换为英制单位。加上严格遵守SI单位可能有一天会让你失去1.25亿美元。

#2


6  

I’d be tempted to store it in a metric unit, as they tend to be simple decimals and not complex values like pounds and ounces. That way, you can just store the one value (i.e. 103.25 kg) rather than the pounds–ounces equivalent, and it’s easier to perform conversions.

我很想将它存储在一个公制单位中,因为它们往往是简单的小数,而不是像磅和盎司这样复杂的值。这样,你可以存储一个值(即103.25千克)而不是磅盎司当量,并且更容易执行转换。

This is something I’ve dealt with in the past. I do a lot of work on pro wrestling and mixed martial arts (MMA) websites where fighters’ heights and weights need to be recorded. They tend to be displayed as feet and inches and pounds and ounces, but I still store the values in their centimetres and kilogram equivalents, and then do the conversion when displaying on the site.

这是我过去处理过的事情。我在职业摔跤和混合武术(MMA)网站上做了很多工作,需要记录战士的身高和体重。它们往往显示为英尺和英寸以及磅和盎司,但我仍然将这些值存储在它们的厘米和千克当量中,然后在网站上显示时进行转换。

#3


0  

First, I had not known about how floating point numbers were inaccurate - thankfully a search latter helps me understand: Floating Point Inaccuracy Examples

首先,我不知道浮点数是如何不准确的 - 谢天谢地,搜索后者帮助我理解:浮点不准确的例子

I would fully agree with @eggyal - keep the data in a single format in a single column. This allows you to expose it to the application and let the application deal with the presentation of it - be it in lbs/oz, rounded up lbs, whatever.

我完全同意@eggyal - 将数据保存在一个列中的单一格式中。这允许您将它暴露给应用程序并让应用程序处理它的呈现 - 无论是lbs / oz,四舍五入的lbs,等等。

The database should keep the raw data while the presentation layer dictates the layout.

数据库应保留原始数据,而表示层指示布局。

#1


27  

TL;DR

Choose either option #1 or option #2—there's no difference between them. Don't use option #3, because it's awkward to work with.

选择选项#1或选项#2 - 它们之间没有区别。不要使用选项#3,因为使用它很尴尬。

You claim that there are inherent inaccuracies in floating point numbers. I think that this deserves a little explanation.

您声称浮点数存在固有的不准确性。我认为这值得一点解释。

When deciding upon a numeral system for representing a number (whether on a piece of paper, in a computer circuit, or elsewhere), there are two separate issues to consider:

在决定用于表示数字的数字系统时(无论是在纸上,在计算机电路中还是在其他地方),需要考虑两个单独的问题:

  1. its basis; and

    它的基础;和

  2. its format.

    它的格式。

Pick a base, any base…

Limited by finite space, one cannot represent an arbitrary member of an infinite set. For example: no matter how much paper you buy or how small your handwriting, it'd always be possible to find an integer that won't fit in the given space (you could just keep appending extra digits until the paper runs out). So, with integers, we usually restrict our finite space to representing only those that fall within some particular interval—e.g. if we have space for three digits, we might restrict ourselves to the interval [-999,999].

受有限空间的限制,不能代表无限集的任意成员。例如:无论您购买多少纸张或手写多少,总是有可能找到一个不适合给定空间的整数(您可以在纸张用完之前保持附加额外的数字)。因此,对于整数,我们通常将有限空间限制为仅代表属于某个特定区间的那些 - 例如。如果我们有三位数的空间,我们可能会将自己限制在[-999,999]区间。

Every non-empty interval contains an infinite set of real numbers. In other words, no matter what interval one takes over the real numbers—be it [-999,999], [0,1], [0.000001,0.000002] or anything else—there is still an infinite set of reals within that interval! Therefore arbitrary real numbers must always be "rounded" to something that can be represented in finite space.

每个非空间隔包含一组无穷大的实数。换句话说,无论一个区间接管实数 - 无论是[-999,999],[0,1],[0.000001,0.000002]还是其他任何东西 - 在该区间内仍有无限的实数组!因此,任意实数必须始终“舍入”到可以在有限空间中表示的东西。

The set of real numbers that can be represented in finite space depends upon the numeral system that is used. In our (familiar) positional base-10 system, finite space will suffice for one-half (0.510) but not for one-third (0.33333…10); by contrast, in the (less familiar) positional base-9 system, it is the other way around (those same numbers are respectively 0.44444…9 and 0.39). Irrational numbers always require infinite space in standard positional systems. The consequence of all this is that some numbers that can be represented using only a small amount of space in positional base-10 (and therefore appear to be very "round" to us humans) would actually require infinite binary circuits for storage (and therefore don't appear to be very "round" to our digital friends)!

可以在有限空间中表示的实数集取决于所使用的数字系统。在我们(熟悉的)位置基10系统中,有限空间足够一半(0.510)但不足三分之一(0.33333 ... 10);相比之下,在(不太熟悉的)位置基础9系统中,它是另一种方式(那些相同的数字分别为0.44444 ... 9和0.39)。无理数在标准位置系统中总是需要无限空间。所有这一切的结果是,一些数字只能在位置基数10中使用少量空间来表示(因此看起来对我们人类来说非常“圆”)实际上需要无限的二进制电路来存储(因此对我们的数字朋友来说似乎并非“圆润”)!

We can't do any better for continuous quantities. Ultimately such quantities must use a finite representation in some numeral system: it's arbitrary whether that system happens to be easy on computer circuits, on human fingers, on something else or on nothing at all—whichever system is used, the value must be rounded and therefore it always results in "representation error".

对于连续数量,我们不能做得更好。最终,这样的数量必须在某个数字系统中使用有限的表示:无论系统在计算机电路,人类手指,其他东西上是否容易,或者根本没有任何系统,它是任意的,无论使用哪个系统,该值必须是四舍五入的。因此它总是导致“表示错误”。

In other words, even if one has a perfectly precise measuring instrument (which is physically impossible), then any measurement it reports will already have been rounded to a number that happens to fit on its display (in whatever base it uses—typically decimal, for obvious reasons). So, "86.2 oz" is never actually "86.2 oz" but rather a representation of "something between 86.1500000... oz and 86.2499999... oz". (Actually, because in reality the instrument is imperfect, all we can ever really say is that we have some degree of confidence that the actual value falls within that interval—but that is definitely departing some way from the point here).

换句话说,即使一个人拥有一个完全精确的测量仪器(在物理上是不可能的),那么它报告的任何测量都已经四舍五入到恰好适合其显示器的数字(无论它使用什么基数 - 通常为十进制,显而易见的原因)。因此,“86.2盎司”实际上从来不是“86.2盎司”,而是“86.1500000 ...盎司和86.2499999 ......盎司之间的东西”的表示。 (实际上,因为实际上仪器是不完美的,我们所能真正说的是我们对实际值落在该区间内有一定程度的信心 - 但这肯定是偏离了这里的某种方式)。

But we can do better for discrete quantities. Such values are not "arbitrary real numbers" and therefore none of the above applies to them: they can be represented exactly in the numeral system in which they were defined—and indeed, should be (as converting to another numeral system and truncating to a finite length would result in rounding to an inexact number). Computers can (inefficiently) handle such situations by representing the number as a string: e.g. consider ASCII or BCD encoding.

但我们可以为离散量做得更好。这些值不是“任意实数”,因此以上都不适用于它们:它们可以精确地表示在它们被定义的数字系统中 - 实际上应该是(转换为另一个数字系统并截断为有限长度会导致四舍五入到不精确的数字)。计算机可以(低效率地)通过将数字表示为字符串来处理这种情况:例如考虑ASCII或BCD编码。

Apply a format…

Since it's a property of the numeral system's (somewhat arbitrary) basis, whether or not a value appears to be "round" has no bearing on its precision. That's a really important observation, which runs counter to many people's intuition (and it's the reason I spent so much time explaining numerical basis above).

由于它是数字系统(有些任意)基础的属性,因此值是否为“圆形”与其精度无关。这是一个非常重要的观察,与许多人的直觉背道而驰(这就是我花了这么多时间解释数字基础的原因)。

Precision is instead determined by how many significant figures a representation has. We need a storage format that is capable of recording our values to at least as many significant figures as we consider them to be correct. Taking by way of example values that we consider to be correct when stated as 86.2 and 0.0000862, the two most common options are:

精确度取决于表示有多少重要数字。我们需要一种存储格式,能够将我们的值记录到至少与我们认为正确无关的重要数字。以86.2和0.0000862表示我们认为正确的值为例,两个最常见的选项是:

  • Fixed point, where the number of significant figures depends on magnitude: e.g. in fixed 5-decimal-point representation, our values would be stored as 86.20000 and 0.00009 (and therefore have 7 and 1 significant figures of precision respectively). In this example, precision has been lost in the latter value (and indeed, it wouldn't take much more for us to have been totally unable to represent anything of significance); and the former value stored false precision, which is a waste of our finite space (and indeed, it wouldn't take much more for the value to become so large that it overflows the storage capacity).

    固定点,其中有效数字的数量取决于幅度:例如在固定的5位小数表示中,我们的值将存储为86.20000和0.00009(因此分别具有7和1个有效精度数字)。在这个例子中,精确度已经在后一个值中丢失了(实际上,我们完全无法代表任何重要的东西,这不会花费太多时间);并且前一个值存储了假精度,这是对我们有限空间的浪费(实际上,它不会花费更多的时间来使该值变得如此之大以至于它会溢出存储容量)。

    A common example of when this format might be appropriate is for an accounting system: currency must usually be tracked to the penny irrespective of the monetary sum (therefore less precision is required for small values, but greater precision is required for large values). As it happens, currency is usually also considered to be discrete (pennies are indivisible), so this is also a good example of a situation where a particular basis (decimal for most modern currencies) is desirable to avoid the representation errors discussed above.

    这种格式适用的一个常见例子是会计系统:货币通常必须跟踪到一分钱而不管货币总和(因此小值需要较低的精度,但大值需要更高的精度)。实际上,货币通常也被认为是离散的(便士是不可分割的),所以这也是一个很好的例子,其中特定基础(大多数现代货币的十进制)是理想的,以避免上面讨论的表示错误。

    One usually implements fixed point storage by treating one's values as quotients over a common denominator and storing the numerator as an integer. In our example, the common denominator could be 105, so instead of 86.20000 and 0.00009 one would store the integers 8620000 and 9 and remember that they must be divided by 100000.

    人们通常通过将一个值作为公义分母上的商并将分子存储为整数来实现定点存储。在我们的例子中,公分母可以是105,所以不是86.20000和0.00009,而是存储整数8620000和9,并记住它们必须除以100000。

  • Floating point, where the number of significant figures is constant irrespective of magnitude: e.g. in 5-significant-figure decimal representation, our values would be stored as 86.200 and 0.000086200 (and, by definition, have 5 significant figures of precision both times). In this example, both values have been stored without any loss of precision; and they both also have the same amount of false precision, which is less wasteful (and we can therefore use our finite space to represent a far greater range of values—both large and small).

    浮点,其中有效数字的数量是恒定的,与幅度无关:例如,在5位有效数字十进制表示中,我们的值将存储为86.200和0.000086200(根据定义,两次都有5位有效精度数字)。在这个例子中,两个值都已存储而没有任何精度损失;并且它们都具有相同数量的错误精度,这样可以减少浪费(因此我们可以使用有限空间来表示更大范围的值 - 无论大小)。

    A common example of when this format might be appropriate is for recording any real world measurements: the precision of measuring instruments (which all suffer from both systematic and random errors) is fairly constant irrespective of scale so, given sufficient significant figures (typically around 3 or 4 digits), absolutely no precision is lost even if a change of base resulted in rounding to a different number.

    这种格式适用的一个常见例子是记录任何真实世界的测量结果:测量仪器的精度(均受系统误差和随机误差影响)是相当恒定的,无论尺度如何,给定足够的有效数字(通常约为3)或者4位数),即使基数的变化导致舍入到不同的数字,也绝对不会丢失精度。

    One usually implements floating point storage by treating one's values as integer significands with integer exponents. In our example, the significand could be 86200 for both values whereupon the (base-10) exponent would be -4 and -9 respectively.

    一个人通常通过将一个值作为带有整数指数的整数有效数来处理浮点存储。在我们的例子中,两个值的有效数可能是86200,因此(base-10)指数分别为-4和-9。

    But how precise are the floating point storage formats used by our computers?

    但是我们的计算机使用的浮点存储格式有多精确?

    • An IEEE754 single precision (binary32) floating point number has 24 bits, or log10(224) (over 7) digits, of significance—i.e. it has a tolerance of less than ±0.000006%. In other words, it is more precise than saying "86.20000".

      IEEE754单精度(binary32)浮点数具有24位,或log10(224)(超过7)个数字,具有重要意义,即。它的公差小于±0.000006%。换句话说,它比说“86.20000”更精确。

    • An IEEE754 double precision (binary64) floating point number has 53 bits, or log10(253) (almost 16) digits, of significance—i.e. it has a tolerance of just over ±0.00000000000001%. In other words, it is more precise than saying "86.2000000000000".

      IEEE754双精度(二进制64)浮点数具有53位,或log10(253)(几乎16)位,具有重要意义,即。它的公差刚好超过±0.00000000000001%。换句话说,它比说“86.2000000000000”更精确。

    The most important thing to realise is that these formats are, respectively, over ten thousand and over one trillion times more precise than saying "86.2"—even though their representations in binary happen to round to numbers that appear less "exact" in decimal (more on this shortly)!

    最重要的是要知道这些格式分别超过一万次,比说“86.2”精确度超过一万亿次 - 尽管它们在二进制中的表示恰好围绕着十进制中看起来不那么“精确”的数字(更多关于这一点)!

Notice also that both fixed and floating point formats will result in loss of precision when a value is known more precisely than the format supports. Such rounding errors can propagate in arithmetic operations to yield apparently erroneous results (which no doubt explains your reference to the "inherent inaccuracies" of floating point numbers): for example, 13 × 3000 in 5-place fixed point would yield 999.99000 rather than 1000.00000; and 1081325 in 5-significant figure floating point would yield 0.0034600 rather than 0.0034568.

另请注意,当比已知格式支持的值更精确地知道值时,固定和浮点格式都会导致精度损失。这种舍入误差可以在算术运算中传播,产生明显错误的结果(这无疑解释了你对浮点数的“固有不准确性”的引用):例如,5位固定点的1/3×3000将产生999.99000而不是超过1000.00000;和10/81 - 3/25的5有效数字浮点数将产生0.0034600而不是0.0034568。

The field of numerical analysis is dedicated to understanding these effects, but it is important to realise that any usable system (even performing calculations in your head) is vulnerable to such problems because no method of calculation that is guaranteed to terminate can ever offer infinite precision: consider, for example, how to calculate the area of a circle—there will necessarily be loss of precision in the value used for π, which will propagate into the result.

数值分析领域致力于理解这些影响,但重要的是要认识到任何可用的系统(甚至在你脑中进行计算)都容易受到这些问题的影响,因为没有任何保证终止的计算方法可以提供无限的精度。例如,考虑如何计算圆的面积 - 用于π的值必然会丢失精度,这将传播到结果中。

Conclusion

  1. Real world measurements should use binary floating point: it's fast, compact, extremely precise and no worse than anything else (including the decimal version from which you started). Since MySQL's floating-point datatypes are IEEE754, this is exactly what they offer.

    真实世界的测量应该使用二进制浮点:它快速,紧凑,非常精确,并且不比其他任何东西差(包括你开始的十进制版本)。由于MySQL的浮点数据类型是IEEE754,这正是它们提供的。

  2. Currency applications should use denary fixed point: whilst it's slow and wastes memory, it ensures both that values are not rounded to inexact quantities and that pennies are not lost on large monetary sums. Since MySQL's fixed-point datatypes are BCD-encoded strings, this is exactly what they offer.

    货币应用程序应该使用否定的固定点:虽然它很慢并浪费内存,但它确保这两个值都不会四舍五入到不精确的数量,并且这些便士不会在大额货币金额上丢失。由于MySQL的定点数据类型是BCD编码的字符串,因此这正是它们提供的。

Finally, bear in mind that most programming languages represent fractional values using binary floating-point types: so even if your database stores values in another format, they'll probably get converted (with all the ensuing issues that entails) at the interface with your application code.

最后,请记住,大多数编程语言使用二进制浮点类型表示小数值:因此,即使您的数据库以另一种格式存储值,它们也可能会在与您的接口处转换(包含所有随之而来的问题)应用代码。

Which option is best in this case?

Hopefully I've convinced you that your values can safely (and should) be stored in floating point types without worrying about any "inaccuracies"? Remember, they're more precise than your flimsy 3-significant-digit decimal representation ever was: you just have to ignore false precision (but one must always do that anyway, even if using a fixed-point decimal format).

希望我已经说服你,你的值可以安全地(并且应该)存储在浮点类型中而不用担心任何“不准确”?记住,它们比你脆弱的3位有效数字十进制表示更精确:你只需要忽略错误的精度(但是无论如何,即使使用定点十进制格式,也必须始终这样做)。

As for your question: choose either option 1 or 2 over option 3—it makes comparisons easier (for example, to find the maximum mass, one could just use MAX(mass), whereas to do it efficiently across two columns would require some nesting).

至于你的问题:在选项3中选择选项1或2 - 它使比较更容易(例如,要找到最大质量,可以只使用MAX(质量),而要在两列中有效地进行比较则需要一些嵌套)。

Generally speaking, between those two options it wouldn't much matter which one chooses—floating point numbers are stored with a constant number of significant bits irrespective of their scale (indeed, it could be that some values are rounded to numbers that are closer to their original decimal representation using option 1 whilst simultaneously others are rounded to numbers that are closer to their original decimal representation using option 2: it simply depends how well each particular value can be represented in binary).

一般来说,在这两个选项之间,选择哪一个并不重要 - 浮点数存储有恒定数量的有效位而不管它们的规模如何(实际上,可能是某些值四舍五入到更接近的数字)使用选项1的原始十进制表示,同时使用选项2将其他四舍五入到更接近其原始十进制表示的数字:它仅取决于每个特定值在二进制中的表示能力。

In this case, because it happens that there are 16 ounces to 1 pound (and 16 is a power of 2), the relative difference between the original decimal values and the numbers stored using the two approaches is identical:

在这种情况下,因为它有16盎司到1磅(16是2的幂),原始十进制值和使用这两种方法存储的数字之间的相对差异是相同的:

  1. 5.387510 (not 5.3367187510 as stated in your question) would be stored in a binary32 float as 101.0110001100110011001102 (which is 5.3874998092651367187510): this is 0.0000036% from the original value (but, as discussed above, the "original value" was already a pretty lousy representation of the physical quantity it represents).

    5.387510(不是你问题中描述的5.3367187510)将存储在二进制32浮点数101.0110001100110011001102(这是5.3874998092651367187510):这是原始值的0.0000036%(但是,如上所述,“原始值”已经很糟糕了表示它所代表的物理量)。

    Knowing that a binary32 float stores only 7 decimal digits of precision, our compiler knows for certain that everything from the 8th digit onwards is definitely false precision and therefore must be ignored in every case—thus, provided that our input value didn't require more precision than that (and if it did, binary32 was obviously the wrong choice of format), this guarantees a return to a decimal value that looks just as round as that from which we started: 5.38750010. However, we should really apply domain knowledge at this point (as we should with any storage format) to discard any further false precision that might exist, such as those two trailing zeroes.

    知道binary32 float只存储精度的7位十进制数,我们的编译器肯定知道从第8位开始的所有内容都是假精度,因此必须在每种情况下都被忽略 - 因此,只要我们的输入值不需要更多精度比那个(如果确实如此,binary32显然是错误的格式选择),这保证了返回一个十进制值,看起来像我们开始的那样圆:5.38750010。但是,我们应该在这一点上真正应用领域知识(我们应该使用任何存储格式)来丢弃可能存在的任何进一步的错误精度,例如那两个尾随零。

  2. 86.210 would be stored in a binary32 float as 1010110.001100110011001102 (which is 86.199996948242187510): this is also 0.0000036% from the original value. As before, we then ignore false precision.

    86.210将存储在binary32 float中,如1010110.001100110011001102(即86.199996948242187510):这也是原始值的0.0000036%。和以前一样,我们忽略了错误的精度。

Notice how the binary representations of the numbers are identical, except for the placement of the radix point (which is four bits apart):

注意数字的二进制表示是如何相同的,除了小数点的位置(相隔四位):

101.0110 00110011001100110
101 0110.00110011001100110

This is because 5.3875 × 24 = 86.2.

这是因为5.3875×24 = 86.2。

As an aside: being European (albeit British), I also have a strong aversion to imperial units of measurement—handling values of different scales is just so messy. I'd almost certainly store masses in SI units (e.g. kilograms or grams) and then perform conversions to imperial units as required within the presentation layer of my application. Plus rigidly adhering to SI units might one day save you from losing $125m.

除此之外:作为欧洲人(虽然是英国人),我也对帝国的衡量单位有强烈的厌恶 - 不同规模的处理价值就是如此混乱。我几乎肯定会以SI单位(例如千克或克)存储质量,然后根据我的应用程序的表示层中的要求执行转换为英制单位。加上严格遵守SI单位可能有一天会让你失去1.25亿美元。

#2


6  

I’d be tempted to store it in a metric unit, as they tend to be simple decimals and not complex values like pounds and ounces. That way, you can just store the one value (i.e. 103.25 kg) rather than the pounds–ounces equivalent, and it’s easier to perform conversions.

我很想将它存储在一个公制单位中,因为它们往往是简单的小数,而不是像磅和盎司这样复杂的值。这样,你可以存储一个值(即103.25千克)而不是磅盎司当量,并且更容易执行转换。

This is something I’ve dealt with in the past. I do a lot of work on pro wrestling and mixed martial arts (MMA) websites where fighters’ heights and weights need to be recorded. They tend to be displayed as feet and inches and pounds and ounces, but I still store the values in their centimetres and kilogram equivalents, and then do the conversion when displaying on the site.

这是我过去处理过的事情。我在职业摔跤和混合武术(MMA)网站上做了很多工作,需要记录战士的身高和体重。它们往往显示为英尺和英寸以及磅和盎司,但我仍然将这些值存储在它们的厘米和千克当量中,然后在网站上显示时进行转换。

#3


0  

First, I had not known about how floating point numbers were inaccurate - thankfully a search latter helps me understand: Floating Point Inaccuracy Examples

首先,我不知道浮点数是如何不准确的 - 谢天谢地,搜索后者帮助我理解:浮点不准确的例子

I would fully agree with @eggyal - keep the data in a single format in a single column. This allows you to expose it to the application and let the application deal with the presentation of it - be it in lbs/oz, rounded up lbs, whatever.

我完全同意@eggyal - 将数据保存在一个列中的单一格式中。这允许您将它暴露给应用程序并让应用程序处理它的呈现 - 无论是lbs / oz,四舍五入的lbs,等等。

The database should keep the raw data while the presentation layer dictates the layout.

数据库应保留原始数据,而表示层指示布局。