有效的浮点除法与常整数除数。

时间:2021-07-19 07:20:13

A recent question, whether compilers are allowed to replace floating-point division with floating-point multiplication, inspired me to ask this question.

最近的一个问题是,编译器是否允许用浮点乘法代替浮点除法,这启发了我问这个问题。

Under the stringent requirement, that the results after code transformation shall be bit-wise identical to the actual division operation, it is trivial to see that for binary IEEE-754 arithmetic, this is possible for divisors that are a power of two. As long as the reciprocal of the divisor is representable, multiplying by the reciprocal of the divisor delivers results identical to the division. For example, multiplication by 0.5 can replace division by 2.0.

在严格的要求下,代码转换后的结果与实际的除法运算是完全相同的,对于二进制的IEEE-754算法来说,这是很简单的,这对于两个幂的除数是可能的。只要除数的倒数是可表示的,乘以除数的倒数就能得到与除法相同的结果。例如,0。5的乘法可以用2.0代替除法。

One then wonders for what other divisors such replacements work, assuming we allow any short instruction sequence that replaces division but runs significantly faster, while delivering bit-identical results. In particular allow fused multiply-add operations in addition to plain multiplication. In comments I pointed to the following relevant paper:

假设我们允许使用任何短指令序列来替换分区,但运行速度明显快,同时提供了相同的结果,那么这样的替换工作又会产生什么效果呢?特别允许在普通乘法之外添加融合的多添加操作。在评论中,我指出了以下相关文件:

Nicolas Brisebarre, Jean-Michel Muller, and Saurabh Kumar Raina. Accelerating correctly rounded floating-point division when the divisor is known in advance. IEEE Transactions on Computers, Vol. 53, No. 8, August 2004, pp. 1069-1072.

Nicolas Brisebarre, Jean-Michel Muller,和Saurabh Kumar Raina。当除数被提前知道时,加速正确地圆的浮点除法。IEEE在电脑上的交易,第53卷,第8期,2004年8月,第1069-1072页。

The technique advocated by the authors of the paper precomputes the reciprocal of the divisor y as a normalized head-tail pair zh:zl as follows: zh = 1 / y, zl = fma (-y, zh, 1) / y. Later, the division q = x / y is then computed as q = fma (zh, x, zl * x). The paper derives various conditions that divisor y must satisfy for this algorithm to work. As one readily observes, this algorithm has problems with infinities and zero when the signs of head and tail differ. More importantly, it will fail to deliver correct results for dividends x that are very small in magnitude, because computation of the quotient tail, zl * x, suffers from underflow.

技术论文的作者所倡导的预计算的倒数除数y一双规范化首尾相接zh型:zl如下:zh型= 1 / y,zl =菲利普-马萨(可能是古银,1)/ y。之后,该部门q = x / y然后计算q =菲利普-马萨(zh型x,zl * x)推导各种条件因子y必须满足该算法工作。正如人们很容易观察到的那样,当头部和尾部的信号不同时,该算法存在不定式和零的问题。更重要的是,它将不能为股息x提供非常小的正确结果,因为商尾的计算,zl * x,受到了下流的影响。

The paper also makes a passing reference to an alternative FMA-based division algorithm, pioneered by Peter Markstein when he was at IBM. The relevant reference is:

这篇论文还提到了另一种基于fma的分割算法,该算法是由Peter Markstein在IBM工作时首创的。相关的引用:

P. W. Markstein. Computation of elementary functions on the IBM RISC System/6000 processor. IBM Journal of Research & Development, Vol. 34, No. 1, January 1990, pp. 111-119

p . w . Markstein。计算IBM RISC系统/6000处理器上的基本功能。《IBM研究与发展杂志》,第34卷,第1期,1990年1月,第111-119页。

In Markstein's algorithm, one first computes a reciprocal rc, from which an initial quotient q = x * rc is formed. Then, the remainder of the division is computed accurately with an FMA as r = fma (-y, q, x), and an improved, more accurate quotient is finally computed as q = fma (r, rc, q).

在Markstein的算法中,一个首先计算一个交互rc,它由一个初始商q = x * rc组成。然后,用FMA (-y, q, x)来精确计算除法的剩余部分,最后计算出q = FMA (r, rc, q)。

This algorithm also has issues for x that are zeroes or infinities (easily worked around with appropriate conditional execution), but exhaustive testing using IEEE-754 single-precision float data shows that it delivers the correct quotient across all possibe dividends x for many divisors y, among these many small integers. This C code implements it:

这个算法也有关于x的问题,它是0或infinities(很容易在适当的条件执行的情况下工作),但是使用IEEE-754单精度浮点数据的穷举测试表明,它在所有可能的情况下都提供了正确的商,在这些小整数中,有许多除数。这个C代码实现了它:

/* precompute reciprocal */
rc = 1.0f / y;

/* compute quotient q=x/y */
q = x * rc;
if ((x != 0) && (!isinf(x))) {
    r = fmaf (-y, q, x);
    q = fmaf (r, rc, q);
}

On most processor architectures, this should translate into a branchless sequence of instructions, using either predication, conditional moves, or select-type instructions. To give a concrete example: For division by 3.0f, the nvcc compiler of CUDA 7.5 generates the following machine code for a Kepler-class GPU:

在大多数处理器体系结构中,这应该转化为一个无分支的指令序列,使用谓词、条件移动或选择类型指令。举一个具体的例子:For division by 3.0f, CUDA 7.5的nvcc编译器为一个Kepler-class GPU生成以下机器码:

    LDG.E R5, [R2];                        // load x
    FSETP.NEU.AND P0, PT, |R5|, +INF , PT; // pred0 = fabsf(x) != INF
    FMUL32I R2, R5, 0.3333333432674408;    // q = x * (1.0f/3.0f)
    FSETP.NEU.AND P0, PT, R5, RZ, P0;      // pred0 = (x != 0.0f) && (fabsf(x) != INF)
    FMA R5, R2, -3, R5;                    // r = fmaf (q, -3.0f, x);
    MOV R4, R2                             // q
@P0 FFMA R4, R5, c[0x2][0x0], R2;          // if (pred0) q = fmaf (r, (1.0f/3.0f), q)
    ST.E [R6], R4;                         // store q

For my experiments, I wrote the tiny C test program shown below that steps through integer divisors in increasing order and for each of them exhaustively tests the above code sequence against the proper division. It prints a list of the divisors that passed this exhaustive test. Partial output looks as follows:

在我的实验中,我编写了一个小的C测试程序,它在递增顺序中通过整数除数来完成,并且对每一个都进行了详尽的测试。它打印通过这个详尽测试的因数的列表。部分输出如下:

PASS: 1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 15, 16, 17, 19, 21, 23, 25, 27, 29, 31, 32, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 64, 65, 67, 69,

To incorporate the replacement algorithm into a compiler as an optimization, a whitelist of divisors to which the above code transformation can safely be applied is impractical. The output of the program so far (at a rate of about one result per minute) suggests that the fast code works correctly across all possible encodings of x for those divisors y that are odd integers or are powers of two. Anecdotal evidence, not a proof, of course.

为了将替换算法合并到编译器中作为优化,可以安全地应用上面代码转换所能使用的一份白名单,这是不切实际的。到目前为止,这个程序的输出(以每分钟大约一个结果的速度)表明,快速代码能够正确地工作,在所有可能的编码中,对于那些奇数或者是两个整数的因数,都可以得到x的编码。当然不是证据,而是轶事证据。

What set of mathematical conditions can determine a-priori whether the transformation of division into the above code sequence is safe? Answers can assume that all the floating-point operations are performed in the default rounding mode of "round to nearest or even".

什么样的数学条件可以决定a-先验的划分是否在上面的代码序列中是安全的?答案可以假定所有浮点运算都是在“四舍五入”的默认舍入模式下执行的。

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

int main (void)
{
    float r, q, x, y, rc;
    volatile union {
        float f;
        unsigned int i;
    } arg, res, ref;
    int err;

    y = 1.0f;
    printf ("PASS: ");
    while (1) {
        /* precompute reciprocal */
        rc = 1.0f / y;

        arg.i = 0x80000000;
        err = 0;
        do {
            /* do the division, fast */
            x = arg.f;
            q = x * rc;
            if ((x != 0) && (!isinf(x))) {
                r = fmaf (-y, q, x);
                q = fmaf (r, rc, q);
            }
            res.f = q;
            /* compute the reference, slowly */
            ref.f = x / y;

            if (res.i != ref.i) {
                err = 1;
                break;
            }
            arg.i--;
        } while (arg.i != 0x80000000);

        if (!err) printf ("%g, ", y);
        y += 1.0f;
    }
    return EXIT_SUCCESS;
}

4 个解决方案

#1


6  

Let me restart for the third time. We are trying to accelerate

让我重新开始第三次。我们正在努力加速。

    q = x / y

where y is an integer constant, and q, x, and y are all IEEE 754-2008 binary32 floating-point values. Below, fmaf(a,b,c) indicates a fused multiply-add a * b + c using binary32 values.

其中y是一个整数常量,q、x和y都是IEEE 754-2008 binary32浮点值。下面,fmaf(a,b,c)表示一个融合的multiply,使用binary32值添加一个* b + c。

The naive algorithm is via a precalculated reciprocal,

朴素算法是通过预先计算的倒数,

    C = 1.0f / y

so that at runtime a (much faster) multiplication suffices:

因此,在运行时(更快)的乘法满足:

    q = x * C

The Brisebarre-Muller-Raina acceleration uses two precalculated constants,

Brisebarre-Muller-Raina加速度使用两个预先计算的常数,

    zh = 1.0f / y
    zl = -fmaf(zh, y, -1.0f) / y

so that at runtime, one multiplication and one fused multiply-add suffices:

因此,在运行时,一个乘法和一个融合的多重相加满足:

    q = fmaf(x, zh, x * zl)

The Markstein algorithm combines the naive approach with two fused multiply-adds that yields the correct result if the naive approach yields a result within 1 unit in the least significant place, by precalculating

Markstein算法结合了朴素的方法和两个融合的multiply-添加,如果天真的方法在最不重要的地方,通过预先计算得到一个结果,就会得到正确的结果。

    C1 = 1.0f / y
    C2 = -y

so that the divison can be approximated using

这样就可以用。

    t1 = x * C1
    t2 = fmaf(C1, t1, x)
    q  = fmaf(C2, t2, t1)

The naive approach works for all powers of two y, but otherwise it is pretty bad. For example, for divisors 7, 14, 15, 28, and 30, it yields an incorrect result for more than half of all possible x.

这个天真的方法适用于所有两个y的幂,但除此之外,它是非常糟糕的。例如,对于7、14、15、28和30的因数,它产生的结果不正确,超过了所有可能的x的一半。

The Brisebarre-Muller-Raina approach similarly fails for almost all non-power of two y, but much fewer x yield the incorrect result (less than half a percent of all possible x, varies depending on y).

类似于Brisebarre-Muller-Raina的方法几乎所有的非幂函数都失败了,但是更少的x产生了不正确的结果(在所有可能的x中还不到0.5%,取决于y)。

The Brisebarre-Muller-Raina article shows that the maximum error in the naive approach is ±1.5 ULPs.

Brisebarre-Muller-Raina文章表明,最大误差在±1.5就ulp进行天真的方法。

The Markstein approach yields correct results for powers of two y, and also for odd integer y. (I have not found a failing odd integer divisor for the Markstein approach.)

Markstein方法得到了两个y的幂的正确结果,以及奇数整数y(我还没有发现Markstein方法的一个失败的奇数整数除数)。


For the Markstein approach, I have analysed divisors 1 - 19700 (raw data here).

对于Markstein方法,我分析了1 - 19700(这里的原始数据)。

Plotting the number of failure cases (divisor in the horizontal axis, the number of values of x where Markstein approach fails for said divisor), we can see a simple pattern occur:

绘制失败案例的数量(横轴上的除数,在Markstein方法失败的x处的值的数目),我们可以看到一个简单的模式:

Markstein failure cases http://www.nominal-animal.net/answers/markstein.png

Markstein失败情况下,http://www.nominal-animal.net/answers/markstein.png

Note that these plots have both horizontal and vertical axes logarithmic. There are no dots for odd divisors, as the approach yields correct results for all odd divisors I've tested.

注意,这些图具有水平轴和垂直轴的对数。对于奇数的除数,没有点,因为这个方法对我测试过的所有奇数除数都有正确的结果。

If we change the x axis to the bit reverse (binary digits in reverse order, i.e. 0b11101101 → 0b10110111, data) of the divisors, we have a very clear pattern: Markstein failure cases, bit reverse divisor http://www.nominal-animal.net/answers/markstein-failures.png

如果我们改变x轴的扭转(二进制数字在相反的顺序,即0 b11101101→0 b10110111数据)的因数,我们有一个非常明确的模式:Markstein失败情况下,反向除数http://www.nominal-animal.net/answers/markstein-failures.png

If we draw a straight line through the center of the point sets, we get curve 4194304/x. (Remember, the plot considers only half the possible floats, so when considering all possible floats, double it.) 8388608/x and 2097152/x bracket the entire error pattern completely.

如果我们画一条直线穿过点集的中心,我们得到曲线4194304/x。(请记住,在考虑所有可能的浮点数时,plot只考虑了一半的浮点数,所以当考虑所有可能的浮点数时,将它翻倍。)8388608/x和2097152/x将整个错误模式全部括起来。

Thus, if we use rev(y) to compute the bit reverse of divisor y, then 8388608/rev(y) is a good first order approximation of the number of cases (out of all possible float) where the Markstein approach yields an incorrect result for an even, non-power-of-two divisor y. (Or, 16777216/rev(x) for the upper limit.)

因此,如果我们使用转速(y)计算因子的钻头扭转y,然后8388608 /牧师(y)是一个很好的一阶近似的情况下(所有可能的浮动),甚至Markstein方法产生不正确的结果,non-power-of-two除数y。(或者,16777216 /牧师(x)的上限。)

Added 2016-02-28: I found an approximation for the number of error cases using the Markstein approach, given any integer (binary32) divisor. Here it is as pseudocode:

添加了2016-02-28:我发现了使用Markstein方法的错误数的近似值,给出了任意整数(binary32)除数。这就是伪代码:

function markstein_failure_estimate(divisor):
    if (divisor is zero)
        return no estimate
    if (divisor is not an integer)
        return no estimate

    if (divisor is negative)
        negate divisor

    # Consider, for avoiding underflow cases,
    if (divisor is very large, say 1e+30 or larger)
        return no estimate - do as division

    while (divisor > 16777216)
        divisor = divisor / 2

    if (divisor is a power of two)
        return 0

    if (divisor is odd)
        return 0

    while (divisor is not odd)
        divisor = divisor / 2

    # Use return (1 + 83833608 / divisor) / 2
    # if only nonnegative finite float divisors are counted!
    return 1 + 8388608 / divisor

This yields a correct error estimate to within ±1 on the Markstein failure cases I have tested (but I have not yet adequately tested divisors larger than 8388608). The final division should be such that it reports no false zeroes, but I cannot guarantee it (yet). It does not take into account very large divisors (say 0x1p100, or 1e+30, and larger in magnitude) which have underflow issues -- I would definitely exclude such divisors from acceleration anyway.

这产生了一个正确的错误估计在±1 Markstein失败情况下我已经测试了(但我尚未充分测试了因数大于8388608)。最后的划分应该是这样的,它不会报告错误的零,但是我不能保证它(然而)。它没有考虑到非常大的因数(比如0x1p100,或者1e+30,以及更大的大小),这些都有不足的问题——我肯定会把这些除数排除在加速度之外。

In preliminary testing, the estimate seems uncannily accurate. I did not draw a plot comparing the estimates and the actual errors for divisors 1 to 20000, because the points all coincide exactly in the plots. (Within this range, the estimate is exact, or one too large.) Essentially, the estimates reproduce the first plot in this answer exactly.

在初步测试中,这一估计似乎惊人的准确。我没有画出一幅图,比较了1到2万之间的估计值和实际误差,因为这些点都恰好在图中。(在这个范围内,估计是准确的,或者是太大。)从本质上讲,这一估算准确地再现了这个答案的第一个情节。


The pattern of failures for the Markstein approach is regular, and very interesting. The approach works for all power of two divisors, and all odd integer divisors.

Markstein方法的失败模式是常规的,而且非常有趣。该方法适用于两个除数的所有幂,以及所有奇数整数除数。

For divisors greater than 16777216, I consistently see the same errors as for a divisor that is divided by the smallest power of two to yield a value less than 16777216. For example, 0x1.3cdfa4p+23 and 0x1.3cdfa4p+41, 0x1.d8874p+23 and 0x1.d8874p+32, 0x1.cf84f8p+23 and 0x1.cf84f8p+34, 0x1.e4a7fp+23 and 0x1.e4a7fp+37. (Within each pair, the mantissa is the same, and only the power of two varies.)

对于大于16777216的除数,我始终会看到与除数相同的错误,除数除以2的最小幂,得到小于16777216的值。例如,0x1.3cdfa4p+23和0x1.3cdfa4p+41, 0x1。d8874p + 23和0 x1。d8874p + 32 0 x1。cf84f8p + 23和0 x1。cf84f8p + 34岁,0 x1。e4a7fp + 23和0 x1.e4a7fp + 37。(在每一对中,尾数是一样的,只有两种不同的力量。)

Assuming my test bench is not in error, this means that the Markstein approach also works divisors larger than 16777216 in magnitude (but smaller than, say, 1e+30), if the divisor is such that when divided by the smallest power of two that yields a quotient of less than 16777216 in magnitude, and the quotient is odd.

假设我试验台没有错误,这意味着Markstein方法也适用因数大于16777216级(但说,小于1 e + 30),如果这样,当除以除数的最小功率两个收益率不到16777216级的商,商是奇数。

#2


7  

This question asks for a way to identify the values of the constant Y that make it safe to transform x / Y into a cheaper computation using FMA for all possible values of x. Another approach is to use static analysis to determine an over-approximation of the values x can take, so that the generally unsound transformation can be applied in the knowledge that the values for which the transformed code differs from the original division do not happen.

这个问题要求确定常数Y的值,使其安全的x / Y转换成更便宜的计算使用菲利普-马萨对于x的所有可能值。另一个方法是使用静态分析来确定over-approximation x的值,所以一般不健全的转换可以应用在知识转化代码的值不同于原来的部门不会发生。

Using representations of sets of floating-point values that are well adapted to the problems of floating-point computations, even a forwards analysis starting from the beginning of the function can produce useful information. For instance:

使用浮点值集合的表示,这些值可以很好地适应浮点计算的问题,甚至从函数的开始处开始的远期分析也能产生有用的信息。例如:

float f(float z) {
  float x = 1.0f + z;
  float r = x / Y;
  return r;
}

Assuming the default round-to-nearest mode(*), in the above function x can only be NaN (if the input is NaN), +0.0f, or a number larger than 2-24 in magnitude, but not -0.0f or anything closer to zero than 2-24. This justifies the transformation into one of the two forms shown in the question for many values of the constant Y.

假设默认的圆到最近的模式(*),在上面的函数x中只能是NaN(如果输入是NaN), +0.0f,或大于2-24的数值,但不是-0.0f或接近于0的值,而不是2-24。这证明了转换为两个形式中的一种,这两种形式都是关于Y的许多值的。

(*) assumption without which many optimizations are impossible and that C compilers already make unless the program explicitly uses #pragma STDC FENV_ACCESS ON

(*)假设没有这一假设,许多优化是不可能的,而且C编译器已经做出了,除非程序显式地使用#pragma STDC FENV_ACCESS。


A forwards static analysis that predicts the information for x above can be based on a representation of sets of floating-point values an expression can take as a tuple of:

前面的静态分析可以预测上面的x信息,它可以基于一组浮点值的表示,表达式可以取为:

  • a representation for the sets of possible NaN values (Since behaviors of NaN are underspecified, a choice is to use only a boolean, with true meaning some NaNs can be present, and false indicating no NaN is present.),
  • 对于可能的NaN值集合的表示(由于NaN的行为未被指定,一个选择是只使用一个布尔值,用true表示一些NaNs可以存在,而false表示没有NaN存在。)
  • four boolean flags indicating respectively the presence of +inf, -inf, +0.0, -0.0,
  • 四个布尔标志分别表示+inf, -inf, +0.0, -0.0,
  • an inclusive interval of negative finite floating-point values, and
  • 一个包含负有限浮点值的包含区间。
  • an inclusive interval of positive finite floating-point values.
  • 一个正有限浮点值的包含区间。

In order to follow this approach, all the floating-point operations that can occur in a C program must be understood by the static analyzer. To illustrate, the addition betweens sets of values U and V, to be used to handle + in the analyzed code, can be implemented as:

为了遵循这种方法,可以在C程序中发生的所有浮点操作都必须由静态分析器来理解。为了说明,在分析的代码中使用U和V来处理+的值,可以被实现为:

  • If NaN is present in one of the operands, or if the operands can be infinities of opposite signs, NaN is present in the result.
  • 如果NaN存在于一个操作数中,或者如果操作数可以是相反的符号,那么NaN就会出现在结果中。
  • If 0 cannot be a result of the addition of a value of U and a value of V, use standard interval arithmetic. The upper bound of the result is obtained for the round-to-nearest addition of the largest value in U and the largest value in V, so these bounds should be computed with round-to-nearest.
  • 如果0不能是加上U的值和V的值,则使用标准区间算术。结果的上界得到了U的最大值和V中最大值的圆到最近的加法,所以这些边界应该用圆到最近的方法计算。
  • If 0 can be a result of the addition of a positive value of U and a negative value of V, then let M be the smallest positive value in U such that -M is present in V.
    • if succ(M) is present in U, then this pair of values contributes succ(M) - M to the positive values of the result.
    • 如果在U中出现了succ(M),那么这对值就会对结果的正值产生succ(M) - M。
    • if -succ(M) is present in V, then this pair of values contributes the negative value M - succ(M) to the negative values of the result.
    • 如果-succ(M)存在于V中,那么这一对值就会将负的M -succ(M)贡献给结果的负值。
    • if pred(M) is present in U, then this pair of values contributes the negative value pred(M) - M to the negative values of the result.
    • 如果在U中存在pred(M),那么这一对值将导致负值pred(M) - M值为结果的负值。
    • if -pred(M) is present in V, then this pair of values contributes the value M - pred(M) to the positive values of the result.
    • 如果-pred(M)存在于V中,那么这一对值将M -pred(M)的值贡献给结果的正值。
  • 如果0可以添加积极的结果的价值U和V的负值,然后让M U - M,是最小的积极价值存在于诉如果succ(M)存在于U,然后这副值贡献succ(M)- M的积极价值观的结果。如果-succ(M)存在于V中,那么这一对值就会将负的M -succ(M)贡献给结果的负值。如果在U中存在pred(M),那么这一对值将导致负值pred(M) - M值为结果的负值。如果-pred(M)存在于V中,那么这一对值将M -pred(M)的值贡献给结果的正值。
  • Do the same work if 0 can be the result of the addition of a negative value of U and a positive value of V.
  • 如果0可以是U的负值加上V的正值,那么做同样的工作。

Acknowledgement: the above borrows ideas from “Improving the Floating Point Addition and Subtraction Constraints”, Bruno Marre & Claude Michel

确认:以上的想法来自于“改善浮点加减约束”,Bruno Marre & Claude Michel。


Example: compilation of the function f below:

示例:下面的函数f的编译:

float f(float z, float t) {
  float x = 1.0f + z;
  if (x + t == 0.0f) {
    float r = x / 6.0f;
    return r;
  }
  return 0.0f;
}

The approach in the question refuses to transform the division in function f into an alternate form, because 6 is not one of the value for which the division can be unconditionally transformed. Instead, what I am suggesting is to apply a simple value analysis starting from the beginning of the function which, in this case, determines that x is a finite float either +0.0f or at least 2-24 in magnitude, and to use this information to apply Brisebarre et al's transformation, confident in the knowledge that x * C2 does not underflow.

这个问题的方法拒绝将函数f的分割转化为另一种形式,因为6不是可以无条件转换的值。相反,我建议的是应用价值分析从一开始简单的函数,在这种情况下,确定x是一个有限浮动+ 0.0 f或至少2-24大小,并使用这些信息来应用Brisebarre等的变换,有信心在x * C2的知识并不下溢。

To be explicit, I am suggesting to use an algorithm such as the one below to decide whether or not to transform the division into something simpler:

为了明确起见,我建议使用如下的算法来决定是否将该划分转换为更简单的方法:

  1. Is Y one of the values that can be transformed using Brisebarre et al's method according to their algorithm?
  2. Y是否可以用Brisebarre等方法根据算法进行转换?
  3. Do C1 and C2 from their method have the same sign, or is it possible to exclude the possibility that the dividend is infinite?
  4. 从他们的方法中,C1和C2是否有相同的符号,或者是否可以排除股息为无穷大的可能性?
  5. Do C1 and C2 from their method have the same sign, or can x take only one of the two representations of 0? If in the case where C1 and C2 have different signs and x can only be one representation of zero, remember to fiddle(**) with the signs of the FMA-based computation to make it produce the correct zero when x is zero.
  6. 从他们的方法中,C1和C2是否有相同的符号,或者x只取0的两个表示中的一个?如果C1和C2有不同的符号,而x只能表示为0,那么请记住使用基于fma的计算的符号,使其在x为0时生成正确的0。
  7. Can the magnitude of the dividend be guaranteed to be large enough to exclude the possibility that x * C2 underflows?
  8. 股利的大小是否可以保证足够大以排除x * C2下流的可能性?

If the answer to the four questions is “yes”, then the division can be transformed into a multiplication and an FMA in the context of the function being compiled. The static analysis described above serves to answer questions 2., 3. and 4.

如果四个问题的答案是“是”,那么这个除法就可以转化为一个乘法和一个FMA在被编译函数的上下文中。上面描述的静态分析可以回答问题2。3。和4。

(**) “fiddling with the signs” means using -FMA(-C1, x, (-C2)*x) in place of FMA(C1, x, C2*x) when this is necessary to make the result come out correctly when x can only be one of the two signed zeroes

(**)“篡改符号”意味着使用-FMA(-C1, x, (-C2)*x)代替FMA(C1, x, C2*x),当x只能是两个有符号的0中的一个时,就必须正确地给出结果。

#3


1  

The result of a floating point division is:

浮点除法的结果是:

  • a sign flag
  • 标志着国旗
  • a significand
  • 一个significand
  • an exponent
  • 一个指数
  • a set of flags (overflow, underflow, inexact, etc - see fenv())
  • 一组旗子(溢流、暗流、不精确等)

Getting the first 3 pieces correct (but the set of flags incorrect) is not enough. Without further knowledge (e.g. which parts of which pieces of the result actually matter, the possible values of the dividend, etc) I would assume that replacing division by a constant with multiplication by a constant (and/or a convoluted FMA mess) is almost never safe.

前3个部分正确(但不正确的标志)是不够的。如果没有进一步的知识(例如,结果的哪些部分实际上是重要的,红利的可能价值等),我将假定用一个常数乘以一个常数(或者一个复杂的FMA的混乱)来替换除法,几乎是不安全的。

In addition; for modern CPUs I also wouldn't assume that replacing a division with 2 FMAs is always an improvement. For example, if the bottleneck is instruction fetch/decode, then this "optimisation" would make performance worse. For another example, if subsequent instructions don't depend on the result (the CPU can do many other instructions in parallel while waiting for the result) the FMA version may introduce multiple dependency stalls and make performance worse. For a third example, if all registers are being used then the FMA version (which requires additional "live" variables) may increase "spilling" and make performance worse.

除了;对于现代cpu,我也不认为用2个FMAs替换一个除法总是一个改进。例如,如果瓶颈是指令获取/解码,那么这种“优化”将使性能更糟。另一个例子是,如果后续指令不依赖于结果(CPU在等待结果时可以并行执行许多其他指令),那么FMA版本可能会引入多个依赖项,并使性能更差。对于第三个例子,如果所有寄存器都被使用,那么FMA版本(需要额外的“活的”变量)可能会增加“溢出”并使性能更糟。

Note that (in many but not all cases) division or multiplication by a constant multiple of 2 can be done with addition alone (specifically, adding a shift count to the exponent).

注意,(在很多情况下,并非所有情况下),一个常数倍的2的除法或乘法可以单独完成(具体地说,将移位计数加到指数中)。

#4


1  

I love @Pascal's answer but in optimization it's often better to have a simple and well-understood subset of transformations rather than a perfect solution.

我喜欢@Pascal的答案,但在优化过程中,最好是有一个简单而理解的转换子集,而不是一个完美的解决方案。

All current and common historical floating point formats had one thing in common: a binary mantissa.

所有当前和常见的历史浮点格式都有一个共同点:二进制的尾数。

Therefore, all fractions were rational numbers of the form:

因此,所有分数都是形式的有理数:

x / 2n

x / 2 n

This is in contrast to the constants in the program (and all possible base-10 fractions) which are rational numbers of the form:

这与程序中的常数(以及所有可能的基础-10分数)相反,它们是形式的合理数字:

x / (2n * 5m)

x / (2n * 5m)

So, one optimization would simply test the input and reciprocal for m == 0, since those numbers are represented exactly in the FP format and operations with them should produce numbers that are accurate within the format.

因此,一个优化将简单地测试m == 0的输入和倒数,因为这些数字完全表示在FP格式中,而与它们的操作应该产生准确的数字。

So, for example, within the (decimal 2-digit) range of .01 to 0.99 dividing or multiplying by the following numbers would be optimized:

例如,在.01到0.99之间的(小数2位数)范围内,对以下数字进行划分或相乘将得到优化:

.25 .50 .75

And everything else would not. (I think, do test it first, lol.)

其他的都不会。(我想,先测试一下,lol。)

#1


6  

Let me restart for the third time. We are trying to accelerate

让我重新开始第三次。我们正在努力加速。

    q = x / y

where y is an integer constant, and q, x, and y are all IEEE 754-2008 binary32 floating-point values. Below, fmaf(a,b,c) indicates a fused multiply-add a * b + c using binary32 values.

其中y是一个整数常量,q、x和y都是IEEE 754-2008 binary32浮点值。下面,fmaf(a,b,c)表示一个融合的multiply,使用binary32值添加一个* b + c。

The naive algorithm is via a precalculated reciprocal,

朴素算法是通过预先计算的倒数,

    C = 1.0f / y

so that at runtime a (much faster) multiplication suffices:

因此,在运行时(更快)的乘法满足:

    q = x * C

The Brisebarre-Muller-Raina acceleration uses two precalculated constants,

Brisebarre-Muller-Raina加速度使用两个预先计算的常数,

    zh = 1.0f / y
    zl = -fmaf(zh, y, -1.0f) / y

so that at runtime, one multiplication and one fused multiply-add suffices:

因此,在运行时,一个乘法和一个融合的多重相加满足:

    q = fmaf(x, zh, x * zl)

The Markstein algorithm combines the naive approach with two fused multiply-adds that yields the correct result if the naive approach yields a result within 1 unit in the least significant place, by precalculating

Markstein算法结合了朴素的方法和两个融合的multiply-添加,如果天真的方法在最不重要的地方,通过预先计算得到一个结果,就会得到正确的结果。

    C1 = 1.0f / y
    C2 = -y

so that the divison can be approximated using

这样就可以用。

    t1 = x * C1
    t2 = fmaf(C1, t1, x)
    q  = fmaf(C2, t2, t1)

The naive approach works for all powers of two y, but otherwise it is pretty bad. For example, for divisors 7, 14, 15, 28, and 30, it yields an incorrect result for more than half of all possible x.

这个天真的方法适用于所有两个y的幂,但除此之外,它是非常糟糕的。例如,对于7、14、15、28和30的因数,它产生的结果不正确,超过了所有可能的x的一半。

The Brisebarre-Muller-Raina approach similarly fails for almost all non-power of two y, but much fewer x yield the incorrect result (less than half a percent of all possible x, varies depending on y).

类似于Brisebarre-Muller-Raina的方法几乎所有的非幂函数都失败了,但是更少的x产生了不正确的结果(在所有可能的x中还不到0.5%,取决于y)。

The Brisebarre-Muller-Raina article shows that the maximum error in the naive approach is ±1.5 ULPs.

Brisebarre-Muller-Raina文章表明,最大误差在±1.5就ulp进行天真的方法。

The Markstein approach yields correct results for powers of two y, and also for odd integer y. (I have not found a failing odd integer divisor for the Markstein approach.)

Markstein方法得到了两个y的幂的正确结果,以及奇数整数y(我还没有发现Markstein方法的一个失败的奇数整数除数)。


For the Markstein approach, I have analysed divisors 1 - 19700 (raw data here).

对于Markstein方法,我分析了1 - 19700(这里的原始数据)。

Plotting the number of failure cases (divisor in the horizontal axis, the number of values of x where Markstein approach fails for said divisor), we can see a simple pattern occur:

绘制失败案例的数量(横轴上的除数,在Markstein方法失败的x处的值的数目),我们可以看到一个简单的模式:

Markstein failure cases http://www.nominal-animal.net/answers/markstein.png

Markstein失败情况下,http://www.nominal-animal.net/answers/markstein.png

Note that these plots have both horizontal and vertical axes logarithmic. There are no dots for odd divisors, as the approach yields correct results for all odd divisors I've tested.

注意,这些图具有水平轴和垂直轴的对数。对于奇数的除数,没有点,因为这个方法对我测试过的所有奇数除数都有正确的结果。

If we change the x axis to the bit reverse (binary digits in reverse order, i.e. 0b11101101 → 0b10110111, data) of the divisors, we have a very clear pattern: Markstein failure cases, bit reverse divisor http://www.nominal-animal.net/answers/markstein-failures.png

如果我们改变x轴的扭转(二进制数字在相反的顺序,即0 b11101101→0 b10110111数据)的因数,我们有一个非常明确的模式:Markstein失败情况下,反向除数http://www.nominal-animal.net/answers/markstein-failures.png

If we draw a straight line through the center of the point sets, we get curve 4194304/x. (Remember, the plot considers only half the possible floats, so when considering all possible floats, double it.) 8388608/x and 2097152/x bracket the entire error pattern completely.

如果我们画一条直线穿过点集的中心,我们得到曲线4194304/x。(请记住,在考虑所有可能的浮点数时,plot只考虑了一半的浮点数,所以当考虑所有可能的浮点数时,将它翻倍。)8388608/x和2097152/x将整个错误模式全部括起来。

Thus, if we use rev(y) to compute the bit reverse of divisor y, then 8388608/rev(y) is a good first order approximation of the number of cases (out of all possible float) where the Markstein approach yields an incorrect result for an even, non-power-of-two divisor y. (Or, 16777216/rev(x) for the upper limit.)

因此,如果我们使用转速(y)计算因子的钻头扭转y,然后8388608 /牧师(y)是一个很好的一阶近似的情况下(所有可能的浮动),甚至Markstein方法产生不正确的结果,non-power-of-two除数y。(或者,16777216 /牧师(x)的上限。)

Added 2016-02-28: I found an approximation for the number of error cases using the Markstein approach, given any integer (binary32) divisor. Here it is as pseudocode:

添加了2016-02-28:我发现了使用Markstein方法的错误数的近似值,给出了任意整数(binary32)除数。这就是伪代码:

function markstein_failure_estimate(divisor):
    if (divisor is zero)
        return no estimate
    if (divisor is not an integer)
        return no estimate

    if (divisor is negative)
        negate divisor

    # Consider, for avoiding underflow cases,
    if (divisor is very large, say 1e+30 or larger)
        return no estimate - do as division

    while (divisor > 16777216)
        divisor = divisor / 2

    if (divisor is a power of two)
        return 0

    if (divisor is odd)
        return 0

    while (divisor is not odd)
        divisor = divisor / 2

    # Use return (1 + 83833608 / divisor) / 2
    # if only nonnegative finite float divisors are counted!
    return 1 + 8388608 / divisor

This yields a correct error estimate to within ±1 on the Markstein failure cases I have tested (but I have not yet adequately tested divisors larger than 8388608). The final division should be such that it reports no false zeroes, but I cannot guarantee it (yet). It does not take into account very large divisors (say 0x1p100, or 1e+30, and larger in magnitude) which have underflow issues -- I would definitely exclude such divisors from acceleration anyway.

这产生了一个正确的错误估计在±1 Markstein失败情况下我已经测试了(但我尚未充分测试了因数大于8388608)。最后的划分应该是这样的,它不会报告错误的零,但是我不能保证它(然而)。它没有考虑到非常大的因数(比如0x1p100,或者1e+30,以及更大的大小),这些都有不足的问题——我肯定会把这些除数排除在加速度之外。

In preliminary testing, the estimate seems uncannily accurate. I did not draw a plot comparing the estimates and the actual errors for divisors 1 to 20000, because the points all coincide exactly in the plots. (Within this range, the estimate is exact, or one too large.) Essentially, the estimates reproduce the first plot in this answer exactly.

在初步测试中,这一估计似乎惊人的准确。我没有画出一幅图,比较了1到2万之间的估计值和实际误差,因为这些点都恰好在图中。(在这个范围内,估计是准确的,或者是太大。)从本质上讲,这一估算准确地再现了这个答案的第一个情节。


The pattern of failures for the Markstein approach is regular, and very interesting. The approach works for all power of two divisors, and all odd integer divisors.

Markstein方法的失败模式是常规的,而且非常有趣。该方法适用于两个除数的所有幂,以及所有奇数整数除数。

For divisors greater than 16777216, I consistently see the same errors as for a divisor that is divided by the smallest power of two to yield a value less than 16777216. For example, 0x1.3cdfa4p+23 and 0x1.3cdfa4p+41, 0x1.d8874p+23 and 0x1.d8874p+32, 0x1.cf84f8p+23 and 0x1.cf84f8p+34, 0x1.e4a7fp+23 and 0x1.e4a7fp+37. (Within each pair, the mantissa is the same, and only the power of two varies.)

对于大于16777216的除数,我始终会看到与除数相同的错误,除数除以2的最小幂,得到小于16777216的值。例如,0x1.3cdfa4p+23和0x1.3cdfa4p+41, 0x1。d8874p + 23和0 x1。d8874p + 32 0 x1。cf84f8p + 23和0 x1。cf84f8p + 34岁,0 x1。e4a7fp + 23和0 x1.e4a7fp + 37。(在每一对中,尾数是一样的,只有两种不同的力量。)

Assuming my test bench is not in error, this means that the Markstein approach also works divisors larger than 16777216 in magnitude (but smaller than, say, 1e+30), if the divisor is such that when divided by the smallest power of two that yields a quotient of less than 16777216 in magnitude, and the quotient is odd.

假设我试验台没有错误,这意味着Markstein方法也适用因数大于16777216级(但说,小于1 e + 30),如果这样,当除以除数的最小功率两个收益率不到16777216级的商,商是奇数。

#2


7  

This question asks for a way to identify the values of the constant Y that make it safe to transform x / Y into a cheaper computation using FMA for all possible values of x. Another approach is to use static analysis to determine an over-approximation of the values x can take, so that the generally unsound transformation can be applied in the knowledge that the values for which the transformed code differs from the original division do not happen.

这个问题要求确定常数Y的值,使其安全的x / Y转换成更便宜的计算使用菲利普-马萨对于x的所有可能值。另一个方法是使用静态分析来确定over-approximation x的值,所以一般不健全的转换可以应用在知识转化代码的值不同于原来的部门不会发生。

Using representations of sets of floating-point values that are well adapted to the problems of floating-point computations, even a forwards analysis starting from the beginning of the function can produce useful information. For instance:

使用浮点值集合的表示,这些值可以很好地适应浮点计算的问题,甚至从函数的开始处开始的远期分析也能产生有用的信息。例如:

float f(float z) {
  float x = 1.0f + z;
  float r = x / Y;
  return r;
}

Assuming the default round-to-nearest mode(*), in the above function x can only be NaN (if the input is NaN), +0.0f, or a number larger than 2-24 in magnitude, but not -0.0f or anything closer to zero than 2-24. This justifies the transformation into one of the two forms shown in the question for many values of the constant Y.

假设默认的圆到最近的模式(*),在上面的函数x中只能是NaN(如果输入是NaN), +0.0f,或大于2-24的数值,但不是-0.0f或接近于0的值,而不是2-24。这证明了转换为两个形式中的一种,这两种形式都是关于Y的许多值的。

(*) assumption without which many optimizations are impossible and that C compilers already make unless the program explicitly uses #pragma STDC FENV_ACCESS ON

(*)假设没有这一假设,许多优化是不可能的,而且C编译器已经做出了,除非程序显式地使用#pragma STDC FENV_ACCESS。


A forwards static analysis that predicts the information for x above can be based on a representation of sets of floating-point values an expression can take as a tuple of:

前面的静态分析可以预测上面的x信息,它可以基于一组浮点值的表示,表达式可以取为:

  • a representation for the sets of possible NaN values (Since behaviors of NaN are underspecified, a choice is to use only a boolean, with true meaning some NaNs can be present, and false indicating no NaN is present.),
  • 对于可能的NaN值集合的表示(由于NaN的行为未被指定,一个选择是只使用一个布尔值,用true表示一些NaNs可以存在,而false表示没有NaN存在。)
  • four boolean flags indicating respectively the presence of +inf, -inf, +0.0, -0.0,
  • 四个布尔标志分别表示+inf, -inf, +0.0, -0.0,
  • an inclusive interval of negative finite floating-point values, and
  • 一个包含负有限浮点值的包含区间。
  • an inclusive interval of positive finite floating-point values.
  • 一个正有限浮点值的包含区间。

In order to follow this approach, all the floating-point operations that can occur in a C program must be understood by the static analyzer. To illustrate, the addition betweens sets of values U and V, to be used to handle + in the analyzed code, can be implemented as:

为了遵循这种方法,可以在C程序中发生的所有浮点操作都必须由静态分析器来理解。为了说明,在分析的代码中使用U和V来处理+的值,可以被实现为:

  • If NaN is present in one of the operands, or if the operands can be infinities of opposite signs, NaN is present in the result.
  • 如果NaN存在于一个操作数中,或者如果操作数可以是相反的符号,那么NaN就会出现在结果中。
  • If 0 cannot be a result of the addition of a value of U and a value of V, use standard interval arithmetic. The upper bound of the result is obtained for the round-to-nearest addition of the largest value in U and the largest value in V, so these bounds should be computed with round-to-nearest.
  • 如果0不能是加上U的值和V的值,则使用标准区间算术。结果的上界得到了U的最大值和V中最大值的圆到最近的加法,所以这些边界应该用圆到最近的方法计算。
  • If 0 can be a result of the addition of a positive value of U and a negative value of V, then let M be the smallest positive value in U such that -M is present in V.
    • if succ(M) is present in U, then this pair of values contributes succ(M) - M to the positive values of the result.
    • 如果在U中出现了succ(M),那么这对值就会对结果的正值产生succ(M) - M。
    • if -succ(M) is present in V, then this pair of values contributes the negative value M - succ(M) to the negative values of the result.
    • 如果-succ(M)存在于V中,那么这一对值就会将负的M -succ(M)贡献给结果的负值。
    • if pred(M) is present in U, then this pair of values contributes the negative value pred(M) - M to the negative values of the result.
    • 如果在U中存在pred(M),那么这一对值将导致负值pred(M) - M值为结果的负值。
    • if -pred(M) is present in V, then this pair of values contributes the value M - pred(M) to the positive values of the result.
    • 如果-pred(M)存在于V中,那么这一对值将M -pred(M)的值贡献给结果的正值。
  • 如果0可以添加积极的结果的价值U和V的负值,然后让M U - M,是最小的积极价值存在于诉如果succ(M)存在于U,然后这副值贡献succ(M)- M的积极价值观的结果。如果-succ(M)存在于V中,那么这一对值就会将负的M -succ(M)贡献给结果的负值。如果在U中存在pred(M),那么这一对值将导致负值pred(M) - M值为结果的负值。如果-pred(M)存在于V中,那么这一对值将M -pred(M)的值贡献给结果的正值。
  • Do the same work if 0 can be the result of the addition of a negative value of U and a positive value of V.
  • 如果0可以是U的负值加上V的正值,那么做同样的工作。

Acknowledgement: the above borrows ideas from “Improving the Floating Point Addition and Subtraction Constraints”, Bruno Marre & Claude Michel

确认:以上的想法来自于“改善浮点加减约束”,Bruno Marre & Claude Michel。


Example: compilation of the function f below:

示例:下面的函数f的编译:

float f(float z, float t) {
  float x = 1.0f + z;
  if (x + t == 0.0f) {
    float r = x / 6.0f;
    return r;
  }
  return 0.0f;
}

The approach in the question refuses to transform the division in function f into an alternate form, because 6 is not one of the value for which the division can be unconditionally transformed. Instead, what I am suggesting is to apply a simple value analysis starting from the beginning of the function which, in this case, determines that x is a finite float either +0.0f or at least 2-24 in magnitude, and to use this information to apply Brisebarre et al's transformation, confident in the knowledge that x * C2 does not underflow.

这个问题的方法拒绝将函数f的分割转化为另一种形式,因为6不是可以无条件转换的值。相反,我建议的是应用价值分析从一开始简单的函数,在这种情况下,确定x是一个有限浮动+ 0.0 f或至少2-24大小,并使用这些信息来应用Brisebarre等的变换,有信心在x * C2的知识并不下溢。

To be explicit, I am suggesting to use an algorithm such as the one below to decide whether or not to transform the division into something simpler:

为了明确起见,我建议使用如下的算法来决定是否将该划分转换为更简单的方法:

  1. Is Y one of the values that can be transformed using Brisebarre et al's method according to their algorithm?
  2. Y是否可以用Brisebarre等方法根据算法进行转换?
  3. Do C1 and C2 from their method have the same sign, or is it possible to exclude the possibility that the dividend is infinite?
  4. 从他们的方法中,C1和C2是否有相同的符号,或者是否可以排除股息为无穷大的可能性?
  5. Do C1 and C2 from their method have the same sign, or can x take only one of the two representations of 0? If in the case where C1 and C2 have different signs and x can only be one representation of zero, remember to fiddle(**) with the signs of the FMA-based computation to make it produce the correct zero when x is zero.
  6. 从他们的方法中,C1和C2是否有相同的符号,或者x只取0的两个表示中的一个?如果C1和C2有不同的符号,而x只能表示为0,那么请记住使用基于fma的计算的符号,使其在x为0时生成正确的0。
  7. Can the magnitude of the dividend be guaranteed to be large enough to exclude the possibility that x * C2 underflows?
  8. 股利的大小是否可以保证足够大以排除x * C2下流的可能性?

If the answer to the four questions is “yes”, then the division can be transformed into a multiplication and an FMA in the context of the function being compiled. The static analysis described above serves to answer questions 2., 3. and 4.

如果四个问题的答案是“是”,那么这个除法就可以转化为一个乘法和一个FMA在被编译函数的上下文中。上面描述的静态分析可以回答问题2。3。和4。

(**) “fiddling with the signs” means using -FMA(-C1, x, (-C2)*x) in place of FMA(C1, x, C2*x) when this is necessary to make the result come out correctly when x can only be one of the two signed zeroes

(**)“篡改符号”意味着使用-FMA(-C1, x, (-C2)*x)代替FMA(C1, x, C2*x),当x只能是两个有符号的0中的一个时,就必须正确地给出结果。

#3


1  

The result of a floating point division is:

浮点除法的结果是:

  • a sign flag
  • 标志着国旗
  • a significand
  • 一个significand
  • an exponent
  • 一个指数
  • a set of flags (overflow, underflow, inexact, etc - see fenv())
  • 一组旗子(溢流、暗流、不精确等)

Getting the first 3 pieces correct (but the set of flags incorrect) is not enough. Without further knowledge (e.g. which parts of which pieces of the result actually matter, the possible values of the dividend, etc) I would assume that replacing division by a constant with multiplication by a constant (and/or a convoluted FMA mess) is almost never safe.

前3个部分正确(但不正确的标志)是不够的。如果没有进一步的知识(例如,结果的哪些部分实际上是重要的,红利的可能价值等),我将假定用一个常数乘以一个常数(或者一个复杂的FMA的混乱)来替换除法,几乎是不安全的。

In addition; for modern CPUs I also wouldn't assume that replacing a division with 2 FMAs is always an improvement. For example, if the bottleneck is instruction fetch/decode, then this "optimisation" would make performance worse. For another example, if subsequent instructions don't depend on the result (the CPU can do many other instructions in parallel while waiting for the result) the FMA version may introduce multiple dependency stalls and make performance worse. For a third example, if all registers are being used then the FMA version (which requires additional "live" variables) may increase "spilling" and make performance worse.

除了;对于现代cpu,我也不认为用2个FMAs替换一个除法总是一个改进。例如,如果瓶颈是指令获取/解码,那么这种“优化”将使性能更糟。另一个例子是,如果后续指令不依赖于结果(CPU在等待结果时可以并行执行许多其他指令),那么FMA版本可能会引入多个依赖项,并使性能更差。对于第三个例子,如果所有寄存器都被使用,那么FMA版本(需要额外的“活的”变量)可能会增加“溢出”并使性能更糟。

Note that (in many but not all cases) division or multiplication by a constant multiple of 2 can be done with addition alone (specifically, adding a shift count to the exponent).

注意,(在很多情况下,并非所有情况下),一个常数倍的2的除法或乘法可以单独完成(具体地说,将移位计数加到指数中)。

#4


1  

I love @Pascal's answer but in optimization it's often better to have a simple and well-understood subset of transformations rather than a perfect solution.

我喜欢@Pascal的答案,但在优化过程中,最好是有一个简单而理解的转换子集,而不是一个完美的解决方案。

All current and common historical floating point formats had one thing in common: a binary mantissa.

所有当前和常见的历史浮点格式都有一个共同点:二进制的尾数。

Therefore, all fractions were rational numbers of the form:

因此,所有分数都是形式的有理数:

x / 2n

x / 2 n

This is in contrast to the constants in the program (and all possible base-10 fractions) which are rational numbers of the form:

这与程序中的常数(以及所有可能的基础-10分数)相反,它们是形式的合理数字:

x / (2n * 5m)

x / (2n * 5m)

So, one optimization would simply test the input and reciprocal for m == 0, since those numbers are represented exactly in the FP format and operations with them should produce numbers that are accurate within the format.

因此,一个优化将简单地测试m == 0的输入和倒数,因为这些数字完全表示在FP格式中,而与它们的操作应该产生准确的数字。

So, for example, within the (decimal 2-digit) range of .01 to 0.99 dividing or multiplying by the following numbers would be optimized:

例如,在.01到0.99之间的(小数2位数)范围内,对以下数字进行划分或相乘将得到优化:

.25 .50 .75

And everything else would not. (I think, do test it first, lol.)

其他的都不会。(我想,先测试一下,lol。)