我怎样才能改进这个平方根方法？

I know this sounds like a homework assignment, but it isn't. Lately I've been interested in algorithms used to perform certain mathematical operations, such as sine, square root, etc. At the moment, I'm trying to write the Babylonian method of computing square roots in C#.

我知道这听起来像是家庭作业,但事实并非如此。最近我一直对用于执行某些数学运算的算法感兴趣,例如正弦,平方根等。目前,我正在尝试编写用C#计算平方根的巴比伦方法。

So far, I have this:

到目前为止,我有这个:

public static double SquareRoot(double x) {
    if (x == 0) return 0;

    double r = x / 2; // this is inefficient, but I can't find a better way
                      // to get a close estimate for the starting value of r
    double last = 0;
    int maxIters = 100;

    for (int i = 0; i < maxIters; i++) {
        r = (r + x / r) / 2;
        if (r == last)
            break;
        last = r;
    }

    return r;
}

It works just fine and produces the exact same answer as the .NET Framework's Math.Sqrt() method every time. As you can probably guess, though, it's slower than the native method (by around 800 ticks). I know this particular method will never be faster than the native method, but I'm just wondering if there are any optimizations I can make.

它工作得很好,并且每次都产生与.NET Framework的Math.Sqrt()方法完全相同的答案。但是,你可能会猜到,它比本机方法慢(大约800个滴答)。我知道这种特殊的方法永远不会比本机方法更快,但我只是想知道我是否可以进行任何优化。

The only optimization I saw immediately was the fact that the calculation would run 100 times, even after the answer had already been determined (at which point, r would always be the same value). So, I added a quick check to see if the newly calculated value is the same as the previously calculated value and break out of the loop. Unfortunately, it didn't make much of a difference in speed, but just seemed like the right thing to do.

我立即看到的唯一优化是计算将运行100次,即使已经确定了答案(此时r将始终是相同的值)。因此,我添加了一个快速检查,以查看新计算的值是否与先前计算的值相同并突破循环。不幸的是,它在速度方面没有太大差别,但似乎是正确的做法。

And before you say "Why not just use Math.Sqrt() instead?"... I'm doing this as a learning exercise and do not intend to actually use this method in any production code.

之前你说“为什么不只是使用Math.Sqrt()?”......我这样做是为了学习练习,并不打算在任何生产代码中实际使用这个方法。

12 个解决方案

#1

First, instead of checking for equality (r == last), you should be checking for convergence, wherein r is close to last, where close is defined by an arbitrary epsilon:

首先,不应检查相等性(r == last),而应检查收敛性,其中r接近last,其中close由任意epsilon定义:

eps = 1e-10  // pick any small number
if (Math.Abs(r-last) < eps) break;

As the wikipedia article you linked to mentions - you don't efficiently calculate square roots with Newton's method - instead, you use logarithms.

作为您提及的*文章 - 您不能使用牛顿方法有效地计算平方根 - 而是使用对数。

#2

float InvSqrt (float x){
  float xhalf = 0.5f*x;
  int i = *(int*)&x;
  i = 0x5f3759df - (i>>1);
  x = *(float*)&i;
  x = x*(1.5f - xhalf*x*x);
  return x;}

This is my favorite fast square root. Actually it's the inverse of the square root, but you can invert it after if you want....I can't say if it's faster if you want the square root and not the inverse square root, but it's freaken cool just the same.
http://www.beyond3d.com/content/articles/8/

这是我最喜欢的快速方根。实际上它是平方根的倒数,但你可以在你想要之后将其反转......我不能说如果你想要平方根而不是平方根,那么它是否更快,但是它的变形很酷。 http://www.beyond3d.com/content/articles/8/

#3

What you are doing here is you execute Newton's method of finding a root. So you could just use some more efficient root-finding algorithm. You can start searching for it here.

你在这里做的是你执行牛顿找到根的方法。所以你可以使用一些更有效的根寻找算法。你可以在这里开始搜索它。

#4

Replacing the division by 2 with a bit shift is unlikely to make that big a difference; given that the division is by a constant I'd hope the compiler is smart enough to do that for you, but you may as well try it to see.

用一个位移代替2除法不太可能产生那么大的差异;鉴于该除法是一个常数,我希望编译器足够聪明,可以为你做到这一点,但你也可以尝试一下。

You're much more likely to get an improvement by exiting from the loop early, so either store new r in a variable and compare with old r, or store x/r in a variable and compare that against r before doing the addition and division.

你很有可能通过提前退出循环来获得改进,所以要么将新r存储在变量中并与旧r进行比较,要么将x / r存储在变量中并在执行加法和除法之前将其与r进行比较。

#5

Instead of breaking the loop and then returning r, you could just return r. May not provide any noticable increase in performance.

而不是打破循环然后返回r,你可以只返回r。可能无法提供任何显着的性能提升。

#6

With your method, each iteration doubles the number of correct bits.

使用您的方法,每次迭代都会使正确的位数加倍。

Using a table to obtain the initial 4 bits (for example), you will have 8 bits after the 1st iteration, then 16 bits after the second, and all the bits you need after the fourth iteration (since a double stores 52+1 bits of mantissa).

使用表来获取最初的4位(例如),在第一次迭代后将有8位,然后在第二次迭代后有16位,以及在第四次迭代后需要的所有位(因为双存储52 + 1位)尾数)。

For a table lookup, you can extract the mantissa in [0.5,1[ and exponent from the input (using a function like frexp), then normalize the mantissa in [64,256[ using multiplication by a suitable power of 2.

对于表查找,您可以在[0.5,1 [和输入中的指数(使用类似frexp的函数))中提取尾数,然后在[64,256 [使用乘以适当的2的幂]中对尾数进行归一化。

mantissa *= 2^K
exponent -= K

After this, your input number is still mantissa*2^exponent. K must be 7 or 8, to obtain an even exponent. You can obtain the initial value for the iterations from a table containing all the square roots of the integral part of mantissa. Perform 4 iterations to get the square root r of mantissa. The result is r*2^(exponent/2), constructed using a function like ldexp.

在此之后,您的输入数字仍然是尾数* 2 ^指数。 K必须是7或8才能获得偶数指数。您可以从包含尾数的整数部分的所有平方根的表中获取迭代的初始值。执行4次迭代以获得尾数的平方根r。结果是r * 2 ^(指数/ 2),使用像ldexp这样的函数构造。

EDIT. I put some C++ code below to illustrate this. The OP's function sr1 with improved test takes 2.78s to compute 2^24 square roots; my function sr2 takes 1.42s, and the hardware sqrt takes 0.12s.

编辑。我在下面放了一些C ++代码来说明这一点。具有改进测试的OP函数sr1需要2.78秒来计算2 ^ 24平方根;我的函数sr2需要1.42s,硬件sqrt需要0.12s。

#include <math.h>
#include <stdio.h>

double sr1(double x)
{
  double last = 0;
  double r = x * 0.5;
  int maxIters = 100;
  for (int i = 0; i < maxIters; i++) {
    r = (r + x / r) / 2;
    if ( fabs(r - last) < 1.0e-10 )
      break;
    last = r;
  }
  return r;
}

double sr2(double x)
{
  // Square roots of values in 0..256 (rounded to nearest integer)
  static const int ROOTS256[] = {
    0,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,
    7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,
    9,9,9,9,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,
    11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,
    12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,
    13,13,13,13,13,13,13,13,13,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,
    14,14,14,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,
    15,15,15,15,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16 };

  // Normalize input
  int exponent;
  double mantissa = frexp(x,&exponent); // MANTISSA in [0.5,1[ unless X is 0
  if (mantissa == 0) return 0; // X is 0
  if (exponent & 1) { mantissa *= 128; exponent -= 7; } // odd exponent
  else { mantissa *= 256; exponent -= 8; } // even exponent
  // Here MANTISSA is in [64,256[

  // Initial value on 4 bits
  double root = ROOTS256[(int)floor(mantissa)];

  // Iterate
  for (int it=0;it<4;it++)
    {
      root = 0.5 * (root + mantissa / root);
    }

  // Restore exponent in result
  return ldexp(root,exponent>>1);
}

int main()
{
  // Used to generate the table
  // for (int i=0;i<=256;i++) printf(",%.0f",sqrt(i));

  double s = 0;
  int mx = 1<<24;
  // for (int i=0;i<mx;i++) s += sqrt(i); // 0.120s
  // for (int i=0;i<mx;i++) s += sr1(i);  // 2.780s
  for (int i=0;i<mx;i++) s += sr2(i);  // 1.420s
}

#7

Define a tolerance and return early when subsequent iterations fall within that tolerance.

定义公差并在后续迭代落入该公差范围内时提前返回。

#8

Since you said the code below was not fast enough, try this:

既然你说下面的代码不够快,试试这个:

    static double guess(double n)
    {
        return Math.Pow(10, Math.Log10(n) / 2);
    }

It should be very accurate and hopefully fast.

它应该非常准确,并且希望很快。

Here is code for the initial estimate described here. It appears to be pretty good. Use this code, and then you should also iterate until the values converge within an epsilon of difference.

这是此处描述的初始估算的代码。它看起来很不错。使用此代码,然后您还应该迭代,直到值收敛于差异的epsilon。

    public static double digits(double x)
    {
        double n = Math.Floor(x);
        double d;

        if (d >= 1.0)
        {
            for (d = 1; n >= 1.0; ++d)
            {
                n = n / 10;
            }
        }
        else
        {
            for (d = 1; n < 1.0; ++d)
            {
                n = n * 10;
            }
        }


        return d;
    }

    public static double guess(double x)
    {
        double output;
        double d = Program.digits(x);

        if (d % 2 == 0)
        {
            output = 6*Math.Pow(10, (d - 2) / 2);
        }
        else
        {
            output = 2*Math.Pow(10, (d - 1) / 2);
        }

        return output;
    }

#9

I have been looking at this as well for learning purposes. You may be interested in two modifications I tried.

我一直在考虑这个以用于学习目的。您可能对我尝试过的两个修改感兴趣。

The first was to use a first order taylor series approximation in x0:

第一个是在x0中使用一阶泰勒级数近似:

    Func<double, double> fNewton = (b) =>
    {
        // Use first order taylor expansion for initial guess
        // http://www27.wolframalpha.com/input/?i=series+expansion+x^.5
        double x0 = 1 + (b - 1) / 2;
        double xn = x0;
        do
        {
            x0 = xn;
            xn = (x0 + b / x0) / 2;
        } while (Math.Abs(xn - x0) > Double.Epsilon);
        return xn;
    };

The second was to try a third order (more expensive), iterate

第二个是尝试第三个订单(更昂贵),迭代

    Func<double, double> fNewtonThird = (b) =>
    {
        double x0 = b/2;
        double xn = x0;
        do
        {
            x0 = xn;
            xn = (x0*(x0*x0+3*b))/(3*x0*x0+b);
        } while (Math.Abs(xn - x0) > Double.Epsilon);
        return xn;
    };

I created a helper method to time the functions

我创建了一个辅助方法来计算函数的时间

public static class Helper
{
    public static long Time(
        this Func<double, double> f,
        double testValue)
    {
        int imax = 120000;
        double avg = 0.0;
        Stopwatch st = new Stopwatch();
        for (int i = 0; i < imax; i++)
        {
            // note the timing is strictly on the function
            st.Start();
            var t = f(testValue);
            st.Stop();
            avg = (avg * i + t) / (i + 1);
        }
        Console.WriteLine("Average Val: {0}",avg);
        return st.ElapsedTicks/imax;
    }
}

The original method was faster, but again, might be interesting :)

原始方法更快,但再次,可能有趣:)

#10

Replacing "/ 2" by "* 0.5" makes this ~1.5 times faster on my machine, but of course not nearly as fast as the native implementation.

将“/ 2”替换为“* 0.5”使得我的机器上的速度快〜1.5倍,但当然不如原生实现快。

#11

Well, the native Sqrt() function probably isn't implemented in C#, it'll most likely be done in a low-level language, and it'll certainly be using a more efficient algorithm. So trying to match its speed is probably futile.

好吧,本机Sqrt()函数可能没有在C#中实现,它很可能是用低级语言完成的,它肯定会使用更高效的算法。所以试图匹配它的速度可能是徒劳的。

However, in regard to just trying to optimize your function for the heckuvit, the Wikipedia page you linked recommends the "starting guess" to be 2^floor(D/2), where D represents the number of binary digits in the number. You could give that an attempt, I don't see much else that could be optimized significantly in your code.

但是,关于只是尝试优化heckuvit的功能,您链接的*页面建议“开始猜测”为2 ^ floor(D / 2),其中D表示数字中的二进制数字的数量。你可以尝试一下,我没有看到很多可以在你的代码中显着优化的东西。

#12

You can try r = x >> 1;

你可以尝试r = x >> 1;

instead of / 2 (also in the other place you device by 2). It might give you a slight edge. I would also move the 100 into the loop. Probably nothing, but we are talking about ticks in here.

而不是/ 2(也在你设备的另一个地方2)。它可能会给你一点点优势。我也会将100移动到循环中。可能没什么,但我们在这里讨论的是滴答声。

just checking it now.

现在就检查一下。

EDIT: Fixed the > into >>, but it doesn't work for doubles, so nevermind. the inlining of the 100 gave me no speed increase.

编辑:修正了>进入>>,但它不适用于双打,所以没关系。 100的内联给了我无速度提升。

#1