I'm trying to work with some probabilities that get very small which causes issues. For example
我正在尝试使用一些非常小的概率来导致问题。例如
probs <- c(4.225867e-03,3.463125e-04,2.480971e-05,1.660538e-06,1.074064e-07,6.829168e-09,4.305051e-10,2.702241e-11,1.692533e-12,1.058970e-13,6.622117e-15,4.139935e-16,2.587807e-17,1.617488e-18,1.010964e-19,6.318630e-21,3.949177e-22 2.468246e-23,1.542657e-24,9.641616e-26,6.026013e-27,3.766259e-28,2.353912e-29,1.471195e-30,9.194971e-32
However any arithmetic with this vector causes everything after the 12th entry to round off to zero (probably because it's less than .Machine$double.eps). For example:
但是,使用此向量的任何算术都会导致第12个条目之后的所有内容舍入为零(可能因为它小于.Machine $ double.eps)。例如:
probs > 0
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
but
但
1-probs < 1
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
I've tried using the gmp package but I'm doing combinatoric based calculations and as.bigq(probs) gets really slow when raised to large powers.
我尝试过使用gmp包,但我正在进行基于组合的计算,as.bigq(probs)在提升到大功率时变得非常慢。
Any ways to get around this?
有什么方法可以解决这个问题?
1 个解决方案
#1
6
The case of very small probabilities comes up often in machine learning and other statistical computing topics. You are getting a precision error because of the limitations of the internal representation of floating point numbers. This can be solved using arbitrary precision arithmetic, but that is not commonly done.
在机器学习和其他统计计算主题中经常出现非常小概率的情况。由于浮点数的内部表示的限制,您将获得精度错误。这可以使用任意精度算法来解决,但这种情况并不常见。
The most popular solution is to use a log transformation to represent your probabilities and then use addition instead of multiplication. This is referred to as log-likelihood. This transformation avoids the problem of very small numbers, and in addition, the log-likelihood values can be used directly to compare the probability of things (lower log-likelihood always means lower probability).
最流行的解决方案是使用日志转换来表示您的概率,然后使用加法而不是乘法。这被称为对数似然。这种转换避免了数量非常小的问题,此外,对数似然值可以直接用于比较事物的概率(较低的对数似然总是意味着较低的概率)。
Note that there is a subtle distinction between likelihood and probability, but the log transformation turning very small numbers in to negative ones with less variety in the number of decimal places works regardless.
请注意,可能性和概率之间存在细微差别,但是日志转换将非常小的数字转换为负数,而小数位数的变化较小。
#1
6
The case of very small probabilities comes up often in machine learning and other statistical computing topics. You are getting a precision error because of the limitations of the internal representation of floating point numbers. This can be solved using arbitrary precision arithmetic, but that is not commonly done.
在机器学习和其他统计计算主题中经常出现非常小概率的情况。由于浮点数的内部表示的限制,您将获得精度错误。这可以使用任意精度算法来解决,但这种情况并不常见。
The most popular solution is to use a log transformation to represent your probabilities and then use addition instead of multiplication. This is referred to as log-likelihood. This transformation avoids the problem of very small numbers, and in addition, the log-likelihood values can be used directly to compare the probability of things (lower log-likelihood always means lower probability).
最流行的解决方案是使用日志转换来表示您的概率,然后使用加法而不是乘法。这被称为对数似然。这种转换避免了数量非常小的问题,此外,对数似然值可以直接用于比较事物的概率(较低的对数似然总是意味着较低的概率)。
Note that there is a subtle distinction between likelihood and probability, but the log transformation turning very small numbers in to negative ones with less variety in the number of decimal places works regardless.
请注意,可能性和概率之间存在细微差别,但是日志转换将非常小的数字转换为负数,而小数位数的变化较小。