一个字节的尾随/前导零计数

I'm using Java and I'm coding a chess engine.

我正在使用Java而我正在编写一个国际象棋引擎。

I'm trying to find the index of the first 1 bit and the index of the last 1 bit in a byte.

我试图找到前1位的索引和一个字节中最后1位的索引。

I'm currently using Long.numberOfTrailingZeros() (or something like that) in Java, and would like to emulate that functionality, except with bytes.

我目前在Java中使用Long.numberOfTrailingZeros()(或类似的东西),并希望模拟该功能,除了字节。

Would it be something like:

会是这样的:

byte b = 0b011000101;
int firstOneBit = bitCount ((b & -b) - 1);

If so, how would I implement bitCount relatively efficiently. I don't mind good explainations, please don't just give me code.

如果是这样,我将如何相对有效地实现bitCount。我不介意好解释,请不要只给我代码。

4 个解决方案

#1

use a lookup tabel with 256 entries. to create it:

使用包含256个条目的查找表。创造它:

unsigned int bitcount ( unsigned int i ) {
unsigned int r = 0;
while ( i ) { r+=i&1; i>>=1; } /* bit shift is >>> in java afair */
return r; 
}

this of course does not need to be fast as you do it at most 256 times to init your tabel.

这当然不需要快速,因为你最多执行256次以启动你的表格。

#2

The correct answer is that most all processors have some special instructions to do this sort of thing (leading zeros, trailing zeros, number of ones, etc). x86 has bsf/bsr, powerpc has clz, and so on. Hopefully Integer.numberOfTrailingZeros is smart enough to use these, but that's probably the only way that has a chance of using this sort of platform-specific function in Java (if it even uses them).

正确的答案是,大多数处理器都有一些特殊的指令来执行此类操作(前导零,尾随零,数量等)。 x86有bsf / bsr,powerpc有clz,依此类推。希望Integer.numberOfTrailingZeros足够智能使用这些,但这可能是有机会在Java中使用这种特定于平台的函数的唯一方法(如果它甚至使用它们)。

The Aggregate Magic Algorithms is another place with some approaches to this sort of problem, ranging from the obvious (lookup tables), to some rather clever SWAR approaches. But I suspect they all lose to Integer(x).numberOfTrailingZeros() if the java runtime is smart about the latter; it ought to be possible to optimize out the boxing and use a platform-specific technique for numberOfTrailingZeros, and if it does both that'll win.

聚合魔术算法是另一个有这种问题的方法,从明显的(查找表)到一些相当聪明的SWAR方法。但我怀疑如果java运行时对后者很聪明,它们都会输给Integer(x).numberOfTrailingZeros();它应该可以优化拳击并使用特定于平台的技术为numberOfTrailingZeros,如果它同时做到这两个将赢得。

Just for completeness, the other classic archive of brilliant bit-whacking is the old MIT HAKMEM collection (there's also a semi-modernized C version if your PDP-6/10 assembler skills have gotten rusty).

仅仅为了完整性,另一个经典的点击打击存档是旧的MIT HAKMEM系列(如果您的PDP-6/10汇编技能已经生锈,那么还有一个半现代化的C版本)。

#3

/* Count Leading Zeroes */

static uint8_t clzlut[256] = {
  8,7,6,6,5,5,5,5,
  4,4,4,4,4,4,4,4,
  3,3,3,3,3,3,3,3,
  3,3,3,3,3,3,3,3,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0
};

uint32_t clz(uint32_t val)
{
  uint32_t accum = 0;

  accum += clzlut[val >> 24];
  accum += (accum == 8 ) ? clzlut[(val >> 16) & 0xFF] : 0;
  accum += (accum == 16) ? clzlut[(val >>  8) & 0xFF] : 0;
  accum += (accum == 24) ? clzlut[ val        & 0xFF] : 0;

  return accum;     
}

Explanation:

This works by storing the number of leading zeroes for each permutation of a byte as a lookup table. You use the byte value to look up the count of leading zeroes for that value. Since the example does this for an unsigned int, you shift and mask the four individual bytes, and accumulate the lookups accordingly. The ternary statement is used to stop the accumulation as soon as we find a bit which is set. That the accumulated value is 8, 16 or 24 implies that no set bit is found so far.

这通过将字节的每个排列的前导零的数量存储为查找表来工作。您使用字节值来查找该值的前导零的计数。由于该示例针对unsigned int执行此操作,因此您将移位并屏蔽四个单独的字节,并相应地累积查找。一旦找到设置的位,三元语句就用于停止累积。累加值为8,16或24意味着到目前为止没有找到设置位。

Also, some architectures have hardware support for this (as an instruction). The assembly mnemonic is often called 'CLZ' or 'BSR'. They are abbreviations for "Count leading Zeroes" and "Bit Scan Reverse" respectively.

此外,一些架构对此具有硬件支持(作为指令)。汇编助记符通常称为“CLZ”或“BSR”。它们分别是“Count leading Zeroes”和“Bit Scan Reverse”的缩写。

#4

If you assume that Long.numberOfTrailingZeros is fast (i.e. JIT compiled/optimized to use a single ASM instructions when available), then why can't you simply do something like this:

如果你假设Long.numberOfTrailingZeros很快(即JIT编译/优化以在可用时使用单个ASM指令),那么为什么你不能简单地做这样的事情:

max(8,Long.numberOfTrailingZeros(val))

where val is your byte value converted to a Long. This is also assuming that max() is available and again optimizes to use asm select or max instructions.

其中val是您的字节值转换为Long。这也假设max()可用并再次优化以使用asm select或max指令。

Theoretically, on a machine that supports it, these operations could be JIT compiled to two assembler instructions.

从理论上讲,在支持它的机器上,这些操作可以被JIT编译为两个汇编指令。

#1

use a lookup tabel with 256 entries. to create it:

使用包含256个条目的查找表。创造它:

unsigned int bitcount ( unsigned int i ) {
unsigned int r = 0;
while ( i ) { r+=i&1; i>>=1; } /* bit shift is >>> in java afair */
return r; 
}

this of course does not need to be fast as you do it at most 256 times to init your tabel.

这当然不需要快速,因为你最多执行256次以启动你的表格。

#2

仅仅为了完整性,另一个经典的点击打击存档是旧的MIT HAKMEM系列(如果您的PDP-6/10汇编技能已经生锈,那么还有一个半现代化的C版本)。

#3

/* Count Leading Zeroes */

static uint8_t clzlut[256] = {
  8,7,6,6,5,5,5,5,
  4,4,4,4,4,4,4,4,
  3,3,3,3,3,3,3,3,
  3,3,3,3,3,3,3,3,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0
};

uint32_t clz(uint32_t val)
{
  uint32_t accum = 0;

  accum += clzlut[val >> 24];
  accum += (accum == 8 ) ? clzlut[(val >> 16) & 0xFF] : 0;
  accum += (accum == 16) ? clzlut[(val >>  8) & 0xFF] : 0;
  accum += (accum == 24) ? clzlut[ val        & 0xFF] : 0;

  return accum;     
}

Explanation:

此外,一些架构对此具有硬件支持(作为指令)。汇编助记符通常称为“CLZ”或“BSR”。它们分别是“Count leading Zeroes”和“Bit Scan Reverse”的缩写。

#4

If you assume that Long.numberOfTrailingZeros is fast (i.e. JIT compiled/optimized to use a single ASM instructions when available), then why can't you simply do something like this:

如果你假设Long.numberOfTrailingZeros很快(即JIT编译/优化以在可用时使用单个ASM指令),那么为什么你不能简单地做这样的事情:

max(8,Long.numberOfTrailingZeros(val))

where val is your byte value converted to a Long. This is also assuming that max() is available and again optimizes to use asm select or max instructions.

其中val是您的字节值转换为Long。这也假设max()可用并再次优化以使用asm select或max指令。

Theoretically, on a machine that supports it, these operations could be JIT compiled to two assembler instructions.

从理论上讲,在支持它的机器上,这些操作可以被JIT编译为两个汇编指令。

秒客网

一个字节的尾随/前导零计数

4 个解决方案

#1

#2

#3

Explanation:

#4

#1

#2

#3

Explanation:

#4

相关文章