I understand how integer and floating point data types are stored, and I am guessing that the variable length of decimal data types means it is stored more like a string.
我了解如何存储整数和浮点数据类型,我猜测十进制数据类型的可变长度意味着它更像一个字符串。
Does that imply a performance overhead when using a decimal data type and searching against them?
当使用十进制数据类型并对其进行搜索时,这是否意味着性能开销?
4 个解决方案
#1
19
Pavel has it quite right, I'd just like to explain a little.
帕维尔说得很对,我只是想解释一下。
Presuming that you mean a performance impact as compared to floating point, or fixed-point-offset integer (i.e. storing thousandsths of a cent as an integer): Yes, there is very much a performance impact. PostgreSQL, and by the sounds of things MySQL, store DECIMAL
/ NUMERIC
in binary-coded decimal. This format is more compact than storing the digits as text, but it's still not very efficient to work with.
假定您的意思是与浮点数或定点偏移整数(即将千分之一存储为整数)相比的性能影响:是的,存在很大的性能影响。PostgreSQL,根据事物的声音MySQL,将十进制/数字存储为二进制编码的十进制。这种格式比将数字存储为文本更紧凑,但是使用它仍然不是很有效。
If you're not doing many calculations in the database, the impact is limited to the greater storage space requried for BCD as compared to integer or floating point, and thus the wider rows and slower scans, bigger indexes, etc. Comparision operations in b-tree index searches are also slower, but not enough to matter unless you're already CPU-bound for some other reason.
如果你没有做很多计算在数据库中,产生的影响是有限的,更大的存储空间requried BCD比整数或浮点数,从而更广泛的行和慢扫描,更大的索引,等等。在b -树索引搜索比较操作也较慢,但不够的问题,除非你已经cpu限制的其他原因。
If you're doing lots of calculations with the DECIMAL
/ NUMERIC
values in the database, then performance can really suffer. This is particularly noticeable, at least in PostgreSQL, because Pg can't use more than one CPU for any given query. If you're doing a huge bunch of division & multiplication, more complex maths, aggregation, etc on numerics you can start to find yourself CPU-bound in situations where you would never be when using a float or integer data type. This is particularly noticeable in OLAP-like (analytics) workloads, and in reporting or data transformation during loading or extraction (ETL).
如果在数据库中使用十进制/数字值进行大量计算,那么性能可能会受到影响。这一点特别值得注意,至少在PostgreSQL中是如此,因为对于任何给定的查询,Pg不能使用多个CPU。如果你做了大量的除法和乘法运算,更复杂的数学运算,聚合运算,等等,你可以开始发现你自己在使用浮点数或整数数据类型的情况下是cpu绑定的。这在类似olap(分析)的工作负载以及加载或提取(ETL)期间的报告或数据转换中尤为明显。
Despite the fact that there is a performance impact (which varies based on workload from negligible to quite big) you should generally use numeric
/ decimal
when it is the most appropriate type for your task - i.e. when very high range values must be stored and/or rounding error isn't acceptable.
尽管有一个性能影响(根据工作负载从微不足道的变化相当大)通常应该使用数字/小数时,它是最适合你的任务的类型,即高范围值时必须存储和/或舍入误差是不能接受的。
Occasionally it's worth the hassle of using a bigint and fixed-point offset, but that is clumsy and inflexible. Using floating point instead is very rarely the right answer due to all the challenges of working reliably with floating point values for things like currency.
有时,使用大整数和定点偏移是值得的,但这是笨拙和不灵活的。相反,使用浮点数很少是正确的答案,因为对于像货币这样的东西,使用浮点数来可靠地工作是非常困难的。
(BTW, I'm quite excited that some new Intel CPUs, and IBM's Power 7 range of CPUs, include hardware support for IEEE 754 decimal floating point. If this ever becomes available in lower end CPUs it'll be a huge win for databases.)
顺便说一句,我很高兴一些新的英特尔cpu,以及IBM的Power 7范围的cpu,包括对IEEE 754十进制浮点的硬件支持。如果这在低端cpu中可用,对数据库来说将是一个巨大的胜利。
#2
9
A impact of decimal type (Numeric type in Postgres) depends on usage. For typical OLTP this impact could not be significant - for OLAP can be relative high. In our application a aggregation on large columns with numeric is more times slower than for type double precision.
十进制类型(Postgres中的数字类型)的影响取决于使用情况。对于典型的OLTP,这种影响可能不显著——因为OLAP可能相对较高。在我们的应用程序中,以数字为单位的大列的聚合速度要比双精度的多倍。
Although a current CPU are strong, still is rule - you should to use a Numeric only when you need exact numbers or very high numbers. Elsewhere use float or double precision type.
虽然当前的CPU很强大,但仍然是规则——您应该只在需要确切的数字或非常高的数字时才使用数字。其他地方使用浮动或双精度类型。
#3
3
You are correct: fixed-point data is stored as a (packed BCD) string.
您是正确的:定点数据存储为(打包的BCD)字符串。
To what extent this impacts performance depends on a range of factors, which include:
这对性能的影响程度取决于一系列因素,包括:
-
Do queries utilise an index upon the column?
查询是否在列上使用索引?
-
Can the CPU perform BCD operations in hardware, such as through Intel's BCD opcodes?
CPU是否可以在硬件上执行BCD操作,比如通过英特尔的BCD操作码?
-
Does the OS harness hardware support through library functions?
操作系统是否通过库函数来控制硬件支持?
Overall, any performance impact is likely to be pretty negligable relative to other factors that you may face: so don't worry about it. Remember Knuth's maxim, "premature optimisation is the root of all evil".
总的来说,与你可能面临的其他因素相比,任何性能影响都是相当容易忽略的:所以不要担心。记住Knuth的格言:“过早的优化是万恶之源”。
#4
2
I am guessing that the variable length of decimal data types means it is stored more like a string.
我猜想十进制数据类型的可变长度意味着它更像一个字符串。
Taken from MySql document here
取自MySql文档
The document says
该文件说
as of MySQL 5.0.3 Values for DECIMAL columns no longer are represented as strings that require 1 byte per digit or sign character. Instead, a binary format is used that packs nine decimal digits into 4 bytes. This change to DECIMAL storage format changes the storage requirements as well. The storage requirements for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires 4 bytes, and any remaining digits require some fraction of 4 bytes.
到MySQL 5.0.3时,十进制列的值不再表示为每个数字需要1字节的字符串或符号字符。相反,使用二进制格式将9位十进制数字打包为4个字节。这种对十进制存储格式的更改也改变了存储需求。每个值的整数和小数部分的存储需求分别确定。每个9位数字的倍数都需要4个字节,剩下的数字都需要4个字节。
#1
19
Pavel has it quite right, I'd just like to explain a little.
帕维尔说得很对,我只是想解释一下。
Presuming that you mean a performance impact as compared to floating point, or fixed-point-offset integer (i.e. storing thousandsths of a cent as an integer): Yes, there is very much a performance impact. PostgreSQL, and by the sounds of things MySQL, store DECIMAL
/ NUMERIC
in binary-coded decimal. This format is more compact than storing the digits as text, but it's still not very efficient to work with.
假定您的意思是与浮点数或定点偏移整数(即将千分之一存储为整数)相比的性能影响:是的,存在很大的性能影响。PostgreSQL,根据事物的声音MySQL,将十进制/数字存储为二进制编码的十进制。这种格式比将数字存储为文本更紧凑,但是使用它仍然不是很有效。
If you're not doing many calculations in the database, the impact is limited to the greater storage space requried for BCD as compared to integer or floating point, and thus the wider rows and slower scans, bigger indexes, etc. Comparision operations in b-tree index searches are also slower, but not enough to matter unless you're already CPU-bound for some other reason.
如果你没有做很多计算在数据库中,产生的影响是有限的,更大的存储空间requried BCD比整数或浮点数,从而更广泛的行和慢扫描,更大的索引,等等。在b -树索引搜索比较操作也较慢,但不够的问题,除非你已经cpu限制的其他原因。
If you're doing lots of calculations with the DECIMAL
/ NUMERIC
values in the database, then performance can really suffer. This is particularly noticeable, at least in PostgreSQL, because Pg can't use more than one CPU for any given query. If you're doing a huge bunch of division & multiplication, more complex maths, aggregation, etc on numerics you can start to find yourself CPU-bound in situations where you would never be when using a float or integer data type. This is particularly noticeable in OLAP-like (analytics) workloads, and in reporting or data transformation during loading or extraction (ETL).
如果在数据库中使用十进制/数字值进行大量计算,那么性能可能会受到影响。这一点特别值得注意,至少在PostgreSQL中是如此,因为对于任何给定的查询,Pg不能使用多个CPU。如果你做了大量的除法和乘法运算,更复杂的数学运算,聚合运算,等等,你可以开始发现你自己在使用浮点数或整数数据类型的情况下是cpu绑定的。这在类似olap(分析)的工作负载以及加载或提取(ETL)期间的报告或数据转换中尤为明显。
Despite the fact that there is a performance impact (which varies based on workload from negligible to quite big) you should generally use numeric
/ decimal
when it is the most appropriate type for your task - i.e. when very high range values must be stored and/or rounding error isn't acceptable.
尽管有一个性能影响(根据工作负载从微不足道的变化相当大)通常应该使用数字/小数时,它是最适合你的任务的类型,即高范围值时必须存储和/或舍入误差是不能接受的。
Occasionally it's worth the hassle of using a bigint and fixed-point offset, but that is clumsy and inflexible. Using floating point instead is very rarely the right answer due to all the challenges of working reliably with floating point values for things like currency.
有时,使用大整数和定点偏移是值得的,但这是笨拙和不灵活的。相反,使用浮点数很少是正确的答案,因为对于像货币这样的东西,使用浮点数来可靠地工作是非常困难的。
(BTW, I'm quite excited that some new Intel CPUs, and IBM's Power 7 range of CPUs, include hardware support for IEEE 754 decimal floating point. If this ever becomes available in lower end CPUs it'll be a huge win for databases.)
顺便说一句,我很高兴一些新的英特尔cpu,以及IBM的Power 7范围的cpu,包括对IEEE 754十进制浮点的硬件支持。如果这在低端cpu中可用,对数据库来说将是一个巨大的胜利。
#2
9
A impact of decimal type (Numeric type in Postgres) depends on usage. For typical OLTP this impact could not be significant - for OLAP can be relative high. In our application a aggregation on large columns with numeric is more times slower than for type double precision.
十进制类型(Postgres中的数字类型)的影响取决于使用情况。对于典型的OLTP,这种影响可能不显著——因为OLAP可能相对较高。在我们的应用程序中,以数字为单位的大列的聚合速度要比双精度的多倍。
Although a current CPU are strong, still is rule - you should to use a Numeric only when you need exact numbers or very high numbers. Elsewhere use float or double precision type.
虽然当前的CPU很强大,但仍然是规则——您应该只在需要确切的数字或非常高的数字时才使用数字。其他地方使用浮动或双精度类型。
#3
3
You are correct: fixed-point data is stored as a (packed BCD) string.
您是正确的:定点数据存储为(打包的BCD)字符串。
To what extent this impacts performance depends on a range of factors, which include:
这对性能的影响程度取决于一系列因素,包括:
-
Do queries utilise an index upon the column?
查询是否在列上使用索引?
-
Can the CPU perform BCD operations in hardware, such as through Intel's BCD opcodes?
CPU是否可以在硬件上执行BCD操作,比如通过英特尔的BCD操作码?
-
Does the OS harness hardware support through library functions?
操作系统是否通过库函数来控制硬件支持?
Overall, any performance impact is likely to be pretty negligable relative to other factors that you may face: so don't worry about it. Remember Knuth's maxim, "premature optimisation is the root of all evil".
总的来说,与你可能面临的其他因素相比,任何性能影响都是相当容易忽略的:所以不要担心。记住Knuth的格言:“过早的优化是万恶之源”。
#4
2
I am guessing that the variable length of decimal data types means it is stored more like a string.
我猜想十进制数据类型的可变长度意味着它更像一个字符串。
Taken from MySql document here
取自MySql文档
The document says
该文件说
as of MySQL 5.0.3 Values for DECIMAL columns no longer are represented as strings that require 1 byte per digit or sign character. Instead, a binary format is used that packs nine decimal digits into 4 bytes. This change to DECIMAL storage format changes the storage requirements as well. The storage requirements for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires 4 bytes, and any remaining digits require some fraction of 4 bytes.
到MySQL 5.0.3时,十进制列的值不再表示为每个数字需要1字节的字符串或符号字符。相反,使用二进制格式将9位十进制数字打包为4个字节。这种对十进制存储格式的更改也改变了存储需求。每个值的整数和小数部分的存储需求分别确定。每个9位数字的倍数都需要4个字节,剩下的数字都需要4个字节。