对64位整数快速计算log2

A great programming resource, Bit Twiddling Hacks, proposes (here) the following method to compute log2 of a 32-bit integer:

一个伟大的编程资源，有点扭曲的黑客，建议(这里)下面的方法来计算一个32位整数的log2:

#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
static const char LogTable256[256] = 
{
    -1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
    LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
    LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};

unsigned int v; // 32-bit word to find the log of
unsigned r;     // r will be lg(v)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
    r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else 
{
    r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}

and mentions that

和提到,

The lookup table method takes only about 7 operations to find the log of a 32-bit value. If extended for 64-bit quantities, it would take roughly 9 operations.

查找表方法只需要大约7个操作就可以找到32位值的日志。如果扩展到64位数量，大约需要9个操作。

but, alas, doesn't give any additional info about which way one should actually go to extend the algorithm to 64-bit integers.

但是，唉，并没有提供任何关于如何将算法扩展到64位整数的额外信息。

Any hints about how a 64-bit algorithm of this kind would look like?

有什么关于这种64位算法的提示吗?

6 个解决方案

#1

Intrinsic functions are really fast but still are insufficient for a truly cross-platform, compiler-independent implementation of log2. So in case if anyone is interested, here is the fastest, branch-free, CPU-abstract DeBruijn-like algorithm I've come to while researching the topic on my own.

内在功能确实很快，但对于真正跨平台、独立于编译器的log2实现来说仍然不够。如果有人感兴趣的话，这里有一个最快的，无分支的，cpu抽象的debruihn算法。

const int tab64[64] = {
    63,  0, 58,  1, 59, 47, 53,  2,
    60, 39, 48, 27, 54, 33, 42,  3,
    61, 51, 37, 40, 49, 18, 28, 20,
    55, 30, 34, 11, 43, 14, 22,  4,
    62, 57, 46, 52, 38, 26, 32, 41,
    50, 36, 17, 19, 29, 10, 13, 21,
    56, 45, 25, 31, 35, 16,  9, 12,
    44, 24, 15,  8, 23,  7,  6,  5};

int log2_64 (uint64_t value)
{
    value |= value >> 1;
    value |= value >> 2;
    value |= value >> 4;
    value |= value >> 8;
    value |= value >> 16;
    value |= value >> 32;
    return tab64[((uint64_t)((value - (value >> 1))*0x07EDD5E59A4E28C2)) >> 58];
}

The part of rounding down to the next lower power of 2 was taken from Power-of-2 Boundaries and the part of getting the number of trailing zeros was taken from BitScan (the (bb & -bb) code there is to single out the rightmost bit that is set to 1, which is not need after we've rounded the value down to the next power of 2).

四舍五入到下的2的幂来自2的幂边界和得到的数量落后于0的部分来自BitScan((bb & bb)代码有挑出最右边的位设置为1,这是不需要我们搜集之后的值到下一个力量2)。

And the 32-bit implementation, by the way, is

顺便说一下，32位实现是

const int tab32[32] = {
     0,  9,  1, 10, 13, 21,  2, 29,
    11, 14, 16, 18, 22, 25,  3, 30,
     8, 12, 20, 28, 15, 17, 24,  7,
    19, 27, 23,  6, 26,  5,  4, 31};

int log2_32 (uint32_t value)
{
    value |= value >> 1;
    value |= value >> 2;
    value |= value >> 4;
    value |= value >> 8;
    value |= value >> 16;
    return tab32[(uint32_t)(value*0x07C4ACDD) >> 27];
}

As with any other computational method, log2 requires the input value to be greater than zero.

与任何其他计算方法一样，log2要求输入值大于0。

#2

If you are using GCC, a lookup table is unnecessary in this case.

如果您正在使用GCC，那么在这种情况下不需要查找表。

GCC provides a builtin function to determine the amount of leading zeros:

GCC提供一个内置函数来确定前导0的数量:

Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

内置函数:int __builtin_clz (unsigned int x)返回x中的领先0位，从最重要的位位置开始。如果x = 0，则结果没有定义。

So you can define:

所以您可以定义:

#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))

and it will work for any unsigned long long int. The result is rounded down.

它适用于任何无符号长整数。结果是四舍五入的。

For x86 and AMD64 GCC will compile it to a bsr instruction, so the solution is very fast (much faster than lookup tables).

对于x86和AMD64, GCC将其编译为bsr指令，因此解决方案非常快(比查找表快得多)。

Working example:

工作的例子:

#include <stdio.h>

#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))

int main(void) {
    unsigned long long input;
    while (scanf("%llu", &input) == 1) {
        printf("log(%llu) = %u\n", input, LOG2(input));
    }
    return 0;
}

#3

I was trying to convert Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup to 64-bit by brute forcing the magic number. Needless to say it was taking a while.

我试着在O(lg(N))操作中通过强制使用神奇数字将乘法和查找转换成64位的log以2为底的N位整数。不用说，这需要一段时间。

I then found Desmond's answer and decided to try his magic number as a start point. Since I have a 6 core processor I ran it in parallel starting at 0x07EDD5E59A4E28C2 / 6 multiples. I was surprised it found something immediately. Turns out 0x07EDD5E59A4E28C2 / 2 worked.

然后我找到了德斯蒙德的答案，并决定试试他的神奇数字作为起点。因为我有一个6核处理器，我在0x07EDD5E59A4E28C2 / 6的倍数上并行运行它。我很惊讶它马上就发现了什么。原来0x07EDD5E59A4E28C2 / 2是有效的。

So here is the code for 0x07EDD5E59A4E28C2 which saves you a shift and subtract:

这是0x07EDD5E59A4E28C2的代码这就省去了移位和减法:

int LogBase2(uint64_t n)
{
    static const int table[64] = {
        0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61,
        51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62,
        57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56,
        45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5, 63 };

    n |= n >> 1;
    n |= n >> 2;
    n |= n >> 4;
    n |= n >> 8;
    n |= n >> 16;
    n |= n >> 32;

    return table[(n * 0x03f6eaf2cd271461) >> 58];
}

#4

Base-2 Integer Logarithm

Here is what I do for 64-bit unsigned integers. This calculates the floor of the base-2 logarithm, which is equivalent to the index of the most significant bit. This method is smokingly fast for large numbers because it uses an unrolled loop that executes always in log₂64 = 6 steps.

这是我对64位无符号整数所做的。这就计算了基底-2对数的底面，它等价于最显著位的指数。这种方法对大量快速的冒烟,因为它使用一个展开循环执行总在日志₂64 = 6步骤。

Essentially, what it does is subtracts away progressively smaller squares in the sequence { 0 ≤ k ≤ 5: 2^(2^k) } = { 2³², 2¹⁶, 2⁸, 2⁴, 2², 2¹ } = { 4294967296, 65536, 256, 16, 4, 2, 1 } and sums the exponents k of the subtracted values.

从本质上讲,它是减去逐步序列中的小方块{ 0 k≤≤5:2 ^(2 ^ k)} = { 2³²,2¹⁶,2⁸2⁴2²,2¹} = { 4,4294967296,65536,256,16日2 1 }和金额减去的指数k值。

int uint64_log2(uint64_t n)
{
  #define S(k) if (n >= (UINT64_C(1) << k)) { i += k; n >>= k; }

  int i = -(n == 0); S(32); S(16); S(8); S(4); S(2); S(1); return i;

  #undef S
}

Note that this returns –1 if given the invalid input of 0 (which is what the initial -(n == 0) is checking for). If you never expect to invoke it with n == 0, you could substitute int i = 0; for the initializer and add assert(n != 0); at entry to the function.

注意，如果给定无效输入0(这是初始-(n == 0)的值)，则返回- 1。如果你从不期望用n = 0调用它，你可以代入int i = 0;初始化和添加assert(n != 0);在函数的入口。

Base-10 Integer Logarithm

Base-10 integer logarithms can be calculated using similarly — with the largest square to test being 10¹⁶ because log₁₀2⁶⁴ ≅ 19.2659... (Note: This is not the fastest way to accomplish a base-10 integer logarithm, because it uses integer division, which is inherently slow. A faster implementation would be to use an accumulator with values that grow exponentially, and compare against the accumulator, in effect doing a sort of binary search.)

八进制数数整数对数可以使用同样的计算——最大的广场测试10¹⁶因为日志₁₀2⁶⁴≅19.2659……(注意:这不是实现base-10整数对数的最快方法，因为它使用的是整数除法，它天生就是慢的。一个更快的实现方法是使用一个值呈指数增长的累加器，并与累加器进行比较，实际上是进行一种二进制搜索。

int uint64_log10(uint64_t n)
{
  #define S(k, m) if (n >= UINT64_C(m)) { i += k; n /= UINT64_C(m); }

  int i = -(n == 0);
  S(16,10000000000000000); S(8,100000000); S(4,10000); S(2,100); S(1,10);
  return i;

  #undef S
}

#5

Here's a pretty compact and fast extension, using no additional temporaries:

这是一个非常紧凑和快速的扩展，不需要额外的时间:

r = 0;

/* If its wider than 32 bits, then we already know that log >= 32.
So store it in R.  */
if (v >> 32)
  {
    r = 32;
    v >>= 32;
  }

/* Now do the exact same thing as the 32 bit algorithm,
except we ADD to R this time.  */
if (tt = v >> 16)
  {
    r += (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
  }
else
  {
    r += (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
  }

Here is one built with a chain of ifs, again using no additional temporaries. Might not be the fastest though.

这是一个用if链构建的系统，同样不使用额外的时间。也许不是最快的。

  if (tt = v >> 48)
    {
      r = (t = tt >> 8) ? 56 + LogTable256[t] : 48 + LogTable256[tt];
    }
  else if (tt = v >> 32)
    {
      r = (t = tt >> 8) ? 40 + LogTable256[t] : 32 + LogTable256[tt];
    }
  else if (tt = v >> 16)
    {
      r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
    }
  else 
    {
      r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
    }

#6

The algorithm basically finds out which byte contains the most significant 1 bit, and then looks up that byte in the lookup to find the log of the byte, then adds it to the position of the byte.

算法基本上是找出哪个字节包含最重要的1位，然后在查找中查找该字节，找到该字节的日志，然后将其添加到字节的位置。

Here is a somewhat simplified version of the 32-bit algorithm:

以下是32位算法的简化版本:

if (tt = v >> 16)
{
    if (t = tt >> 8)
    {
        r = 24 + LogTable256[t];
    }
    else
    {
        r = 16 + LogTable256[tt];
    }
}
else 
{
    if (t = v >> 8)
    {
        r = 8 + LogTable256[t];
    }
    else
    {
        r = LogTable256[v];
    }
}

This is the equivalent 64-bit algorithm:

这是等效的64位算法:

if (ttt = v >> 32)
{
    if (tt = ttt >> 16)
    {
        if (t = tt >> 8)
        {
            r = 56 + LogTable256[t];
        }
        else
        {
            r = 48 + LogTable256[tt];
        }
    }
    else 
    {
        if (t = ttt >> 8)
        {
            r = 40 + LogTable256[t];
        }
        else
        {
            r = 32 + LogTable256[ttt];
        }
    }
}
else
{
    if (tt = v >> 16)
    {
        if (t = tt >> 8)
        {
            r = 24 + LogTable256[t];
        }
        else
        {
            r = 16 + LogTable256[tt];
        }
    }
    else 
    {
        if (t = v >> 8)
        {
            r = 8 + LogTable256[t];
        }
        else
        {
            r = LogTable256[v];
        }
    }
}

I came up with an algorithm for any size types that I think is nicer than the original.

我想出了一种算法，适用于任何我认为比原来更好的类型。

unsigned int v = 42;
unsigned int r = 0;
unsigned int b;
for (b = sizeof(v) << 2; b; b = b >> 1)
{
    if (v >> b)
    {
        v = v >> b;
        r += b;
    }
}

Note: b = sizeof(v) << 2 sets b to half the number of bits in v. I used shifting instead of multiplication here (just because I felt like it).

注意:b = sizeof(v) < 2 set b to half of bits in v我这里用移位代替乘法(只是因为我喜欢)。

You could add a lookup table to that algorithm to speed it up possibly, but it's more a proof-of-concept.

您可以向该算法添加一个查找表来加速它，但这更多的是一个概念验证。

#1

const int tab64[64] = {
    63,  0, 58,  1, 59, 47, 53,  2,
    60, 39, 48, 27, 54, 33, 42,  3,
    61, 51, 37, 40, 49, 18, 28, 20,
    55, 30, 34, 11, 43, 14, 22,  4,
    62, 57, 46, 52, 38, 26, 32, 41,
    50, 36, 17, 19, 29, 10, 13, 21,
    56, 45, 25, 31, 35, 16,  9, 12,
    44, 24, 15,  8, 23,  7,  6,  5};

int log2_64 (uint64_t value)
{
    value |= value >> 1;
    value |= value >> 2;
    value |= value >> 4;
    value |= value >> 8;
    value |= value >> 16;
    value |= value >> 32;
    return tab64[((uint64_t)((value - (value >> 1))*0x07EDD5E59A4E28C2)) >> 58];
}

And the 32-bit implementation, by the way, is

顺便说一下，32位实现是

const int tab32[32] = {
     0,  9,  1, 10, 13, 21,  2, 29,
    11, 14, 16, 18, 22, 25,  3, 30,
     8, 12, 20, 28, 15, 17, 24,  7,
    19, 27, 23,  6, 26,  5,  4, 31};

int log2_32 (uint32_t value)
{
    value |= value >> 1;
    value |= value >> 2;
    value |= value >> 4;
    value |= value >> 8;
    value |= value >> 16;
    return tab32[(uint32_t)(value*0x07C4ACDD) >> 27];
}

As with any other computational method, log2 requires the input value to be greater than zero.

与任何其他计算方法一样，log2要求输入值大于0。

#2

If you are using GCC, a lookup table is unnecessary in this case.

如果您正在使用GCC，那么在这种情况下不需要查找表。

GCC provides a builtin function to determine the amount of leading zeros:

GCC提供一个内置函数来确定前导0的数量:

Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

内置函数:int __builtin_clz (unsigned int x)返回x中的领先0位，从最重要的位位置开始。如果x = 0，则结果没有定义。

So you can define:

所以您可以定义:

#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))

and it will work for any unsigned long long int. The result is rounded down.

它适用于任何无符号长整数。结果是四舍五入的。

For x86 and AMD64 GCC will compile it to a bsr instruction, so the solution is very fast (much faster than lookup tables).

对于x86和AMD64, GCC将其编译为bsr指令，因此解决方案非常快(比查找表快得多)。

Working example:

工作的例子:

#include <stdio.h>

#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))

int main(void) {
    unsigned long long input;
    while (scanf("%llu", &input) == 1) {
        printf("log(%llu) = %u\n", input, LOG2(input));
    }
    return 0;
}

#3

I was trying to convert Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup to 64-bit by brute forcing the magic number. Needless to say it was taking a while.

我试着在O(lg(N))操作中通过强制使用神奇数字将乘法和查找转换成64位的log以2为底的N位整数。不用说，这需要一段时间。

So here is the code for 0x07EDD5E59A4E28C2 which saves you a shift and subtract:

这是0x07EDD5E59A4E28C2的代码这就省去了移位和减法:

int LogBase2(uint64_t n)
{
    static const int table[64] = {
        0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61,
        51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62,
        57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56,
        45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5, 63 };

    n |= n >> 1;
    n |= n >> 2;
    n |= n >> 4;
    n |= n >> 8;
    n |= n >> 16;
    n |= n >> 32;

    return table[(n * 0x03f6eaf2cd271461) >> 58];
}

#4

Base-2 Integer Logarithm

从本质上讲,它是减去逐步序列中的小方块{ 0 k≤≤5:2 ^(2 ^ k)} = { 2³²,2¹⁶,2⁸2⁴2²,2¹} = { 4,4294967296,65536,256,16日2 1 }和金额减去的指数k值。

int uint64_log2(uint64_t n)
{
  #define S(k) if (n >= (UINT64_C(1) << k)) { i += k; n >>= k; }

  int i = -(n == 0); S(32); S(16); S(8); S(4); S(2); S(1); return i;

  #undef S
}

Base-10 Integer Logarithm

int uint64_log10(uint64_t n)
{
  #define S(k, m) if (n >= UINT64_C(m)) { i += k; n /= UINT64_C(m); }

  int i = -(n == 0);
  S(16,10000000000000000); S(8,100000000); S(4,10000); S(2,100); S(1,10);
  return i;

  #undef S
}

#5

Here's a pretty compact and fast extension, using no additional temporaries:

这是一个非常紧凑和快速的扩展，不需要额外的时间:

r = 0;

/* If its wider than 32 bits, then we already know that log >= 32.
So store it in R.  */
if (v >> 32)
  {
    r = 32;
    v >>= 32;
  }

/* Now do the exact same thing as the 32 bit algorithm,
except we ADD to R this time.  */
if (tt = v >> 16)
  {
    r += (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
  }
else
  {
    r += (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
  }

Here is one built with a chain of ifs, again using no additional temporaries. Might not be the fastest though.

这是一个用if链构建的系统，同样不使用额外的时间。也许不是最快的。

  if (tt = v >> 48)
    {
      r = (t = tt >> 8) ? 56 + LogTable256[t] : 48 + LogTable256[tt];
    }
  else if (tt = v >> 32)
    {
      r = (t = tt >> 8) ? 40 + LogTable256[t] : 32 + LogTable256[tt];
    }
  else if (tt = v >> 16)
    {
      r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
    }
  else 
    {
      r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
    }

#6

The algorithm basically finds out which byte contains the most significant 1 bit, and then looks up that byte in the lookup to find the log of the byte, then adds it to the position of the byte.

算法基本上是找出哪个字节包含最重要的1位，然后在查找中查找该字节，找到该字节的日志，然后将其添加到字节的位置。

Here is a somewhat simplified version of the 32-bit algorithm:

以下是32位算法的简化版本:

if (tt = v >> 16)
{
    if (t = tt >> 8)
    {
        r = 24 + LogTable256[t];
    }
    else
    {
        r = 16 + LogTable256[tt];
    }
}
else 
{
    if (t = v >> 8)
    {
        r = 8 + LogTable256[t];
    }
    else
    {
        r = LogTable256[v];
    }
}

This is the equivalent 64-bit algorithm:

这是等效的64位算法:

if (ttt = v >> 32)
{
    if (tt = ttt >> 16)
    {
        if (t = tt >> 8)
        {
            r = 56 + LogTable256[t];
        }
        else
        {
            r = 48 + LogTable256[tt];
        }
    }
    else 
    {
        if (t = ttt >> 8)
        {
            r = 40 + LogTable256[t];
        }
        else
        {
            r = 32 + LogTable256[ttt];
        }
    }
}
else
{
    if (tt = v >> 16)
    {
        if (t = tt >> 8)
        {
            r = 24 + LogTable256[t];
        }
        else
        {
            r = 16 + LogTable256[tt];
        }
    }
    else 
    {
        if (t = v >> 8)
        {
            r = 8 + LogTable256[t];
        }
        else
        {
            r = LogTable256[v];
        }
    }
}

I came up with an algorithm for any size types that I think is nicer than the original.

我想出了一种算法，适用于任何我认为比原来更好的类型。

unsigned int v = 42;
unsigned int r = 0;
unsigned int b;
for (b = sizeof(v) << 2; b; b = b >> 1)
{
    if (v >> b)
    {
        v = v >> b;
        r += b;
    }
}

Note: b = sizeof(v) << 2 sets b to half the number of bits in v. I used shifting instead of multiplication here (just because I felt like it).

注意:b = sizeof(v) < 2 set b to half of bits in v我这里用移位代替乘法(只是因为我喜欢)。

You could add a lookup table to that algorithm to speed it up possibly, but it's more a proof-of-concept.

您可以向该算法添加一个查找表来加速它，但这更多的是一个概念验证。

秒客网

对64位整数快速计算log2

6 个解决方案

#1

#2

#3

#4

Base-2 Integer Logarithm

Base-10 Integer Logarithm

#5

#6

#1

#2

#3

#4

Base-2 Integer Logarithm

Base-10 Integer Logarithm

#5

#6

相关文章