计算字节数组中位数总和的最快方法

时间:2022-03-14 03:12:00

I have two byte arrays with the same length. I need to perform XOR operation between each byte and after this calculate sum of bits.

我有两个长度相同的字节数组。我需要在每个字节之间执行XOR运算,然后计算位数之和。

For example:

例如:

11110000^01010101 = 10100101 -> so 1+1+1+1 = 4

I need do the same operation for each element in byte array.

我需要对字节数组中的每个元素执行相同的操作。

9 个解决方案

#1


11  

Use a lookup table. There are only 256 possible values after XORing, so it's not exactly going to take a long time. Unlike izb's solution though, I wouldn't suggest manually putting all the values in though - compute the lookup table once at startup using one of the looping answers.

使用查找表。 XORing后只有256个可能的值,所以它不会花费很长时间。与izb的解决方案不同,我不建议手动输入所有值 - 使用其中一个循环答案在启动时计算一次查找表。

For example:

例如:

public static class ByteArrayHelpers
{
    private static readonly int[] LookupTable =
        Enumerable.Range(0, 256).Select(CountBits).ToArray();

    private static int CountBits(int value)
    {
        int count = 0;
        for (int i=0; i < 8; i++)
        {
           count += (value >> i) & 1;
        }
        return count;
    }

    public static int CountBitsAfterXor(byte[] array)
    {
        int xor = 0;
        foreach (byte b in array)
        {
            xor ^= b;
        }
        return LookupTable[xor];
    }
}

(You could make it an extension method if you really wanted...)

(如果你真的想要,你可以把它变成一种扩展方法......)

Note the use of byte[] in the CountBitsAfterXor method - you could make it an IEnumerable<byte> for more generality, but iterating over an array (which is known to be an array at compile-time) will be faster. Probably only microscopically faster, but hey, you asked for the fastest way :)

注意在CountBitsAfterXor方法中使用byte [] - 你可以使它成为一个IEnumerable 以获得更多的通用性,但迭代一个数组(在编译时已知是一个数组)会更快。可能只是在显微镜下更快,但嘿,你要求最快的方式:)

I would almost certainly actually express it as

我几乎肯定会把它表达为

public static int CountBitsAfterXor(IEnumerable<byte> data)

in real life, but see which works better for you.

在现实生活中,但看看哪个更适合你。

Also note the type of the xor variable as an int. In fact, there's no XOR operator defined for byte values, and if you made xor a byte it would still compile due to the nature of compound assignment operators, but it would be performing a cast on each iteration - at least in the IL. It's quite possible that the JIT would take care of this, but there's no need to even ask it to :)

还要注意xor变量的类型为int。事实上,没有为字节值定义XOR运算符,并且如果你使xor成为一个字节,由于复合赋值运算符的性质,它仍然会编译,但它会在每次迭代时执行强制转换 - 至少在IL中。 JIT很可能会解决这个问题,但是甚至没有必要要求它:)

#2


9  

Fastest way would probably be a 256-element lookup table...

最快的方式可能是一个256元素的查找表......

int[] lut
{
    /*0x00*/ 0,
    /*0x01*/ 1,
    /*0x02*/ 1,
    /*0x03*/ 2
    ...
    /*0xFE*/ 7,
    /*0xFF*/ 8
}

e.g.

例如

11110000^01010101 = 10100101 -> lut[165] == 4

#3


5  

This is more commonly referred to as bit counting. There are literally dozens of different algorithms for doing this. Here is one site which lists a few of the more well known methods. There are even CPU specific instructions for doing this.

这通常被称为比特计数。实际上有几十种不同的算法。这是一个列出一些更为人熟知的方法的站点。甚至还有CPU特定的指令来执行此操作。

Theorectically, Microsoft could add a BitArray.CountSetBits function that gets JITed with the best algorithm for that CPU architecture. I, for one, would welcome such an addition.

从理论上讲,Microsoft可以添加一个BitArray.CountSetBits函数,该函数使用该CPU架构的最佳算法进行JITed。我个人会欢迎这样的补充。

#4


3  

As I understood it you want to sum the bits of each XOR between the left and right bytes.

据我所知,你想要在左右字节之间对每个XOR的位求和。

for (int b = 0; b < left.Length; b++) {
  int num = left[b] ^ right[b];
  int sum = 0;

  for (int i = 0; i < 8; i++) {
    sum += (num >> i) & 1;
  }

   // do something with sum maybe?
}

#5


2  

I'm not sure if you mean sum the bytes or the bits. To sum the bits within a byte, this should work:

我不确定你的意思是总和字节还是比特。要对一个字节内的位求和,这应该有效:

int nSum = 0;
for (int i=0; i<=7; i++)
{
   nSum += (byte_val>>i) & 1;
}

You would then need the xoring, and array looping around this, of course.

然后,你需要xoring,并且当然要围绕它进行数组循环。

#6


1  

The following should do

以下应该做

int BitXorAndSum(byte[] left, byte[] right) {
  int sum = 0;
  for ( var i = 0; i < left.Length; i++) { 
    sum += SumBits((byte)(left[i] ^ right[i]));
  }
  return sum;
}

int SumBits(byte b) {
  var sum = 0;
  for (var i = 0; i < 8; i++) {
    sum += (0x1) & (b >> i);
  }
  return sum;
}

#7


1  

It can be rewritten as ulong and use unsafe pointer, but byte is easier to understand:

它可以重写为ulong并使用不安全的指针,但字节更容易理解:

static int BitCount(byte num)
{
    // 0x5 = 0101 (bit) 0x55 = 01010101
    // 0x3 = 0011 (bit) 0x33 = 00110011
    // 0xF = 1111 (bit) 0x0F = 00001111
    uint count = num;
    count = ((count >> 1) & 0x55) + (count & 0x55);
    count = ((count >> 2) & 0x33) + (count & 0x33);
    count = ((count >> 4) & 0xF0) + (count & 0x0F);
    return (int)count;
}

#8


0  

A general function to count bits could look like:

计算位的一般函数可能如下所示:

int Count1(byte[] a)
{
  int count = 0;
  for (int i = 0; i < a.Length; i++)
  {
    byte b = a[i];
    while (b != 0)
    {
      count++;
      b = (byte)((int)b & (int)(b - 1));
    }
  }
  return count;
}

The less 1-bits, the faster this works. It simply loops over each byte, and toggles the lowest 1 bit of that byte until the byte becomes 0. The castings are necessary so that the compiler stops complaining about the type widening and narrowing.

1位越少,效果越快。它只是循环遍历每个字节,并切换该字节的最低1位,直到字节变为0.必须使用强制转换,以便编译器停止抱怨类型扩展和缩小。

Your problem could then be solved by using this:

然后可以使用以下方法解决您的问题:

int Count1Xor(byte[] a1, byte[] a2)
{
  int count = 0;
  for (int i = 0; i < Math.Min(a1.Length, a2.Length); i++)
  {
    byte b = (byte)((int)a1[i] ^ (int)a2[i]);
    while (b != 0)
    {
      count++;
      b = (byte)((int)b & (int)(b - 1));
    }
  }
  return count;
}

#9


0  

A lookup table should be the fastest, but if you want to do it without a lookup table, this will work for bytes in just 10 operations.

查找表应该是最快的,但是如果你想在没有查找表的情况下这样做,这将仅适用于10个操作中的字节。

public static int BitCount(byte value) {
    int v = value - ((value >> 1) & 0x55);
    v = (v & 0x33) + ((v >> 2) & 0x33);
    return ((v + (v >> 4) & 0x0F));
}

This is a byte version of the general bit counting function described at Sean Eron Anderson's bit fiddling site.

这是Sean Eron Anderson位小巧的网站上描述的通用比特计数功能的字节版本。

#1


11  

Use a lookup table. There are only 256 possible values after XORing, so it's not exactly going to take a long time. Unlike izb's solution though, I wouldn't suggest manually putting all the values in though - compute the lookup table once at startup using one of the looping answers.

使用查找表。 XORing后只有256个可能的值,所以它不会花费很长时间。与izb的解决方案不同,我不建议手动输入所有值 - 使用其中一个循环答案在启动时计算一次查找表。

For example:

例如:

public static class ByteArrayHelpers
{
    private static readonly int[] LookupTable =
        Enumerable.Range(0, 256).Select(CountBits).ToArray();

    private static int CountBits(int value)
    {
        int count = 0;
        for (int i=0; i < 8; i++)
        {
           count += (value >> i) & 1;
        }
        return count;
    }

    public static int CountBitsAfterXor(byte[] array)
    {
        int xor = 0;
        foreach (byte b in array)
        {
            xor ^= b;
        }
        return LookupTable[xor];
    }
}

(You could make it an extension method if you really wanted...)

(如果你真的想要,你可以把它变成一种扩展方法......)

Note the use of byte[] in the CountBitsAfterXor method - you could make it an IEnumerable<byte> for more generality, but iterating over an array (which is known to be an array at compile-time) will be faster. Probably only microscopically faster, but hey, you asked for the fastest way :)

注意在CountBitsAfterXor方法中使用byte [] - 你可以使它成为一个IEnumerable 以获得更多的通用性,但迭代一个数组(在编译时已知是一个数组)会更快。可能只是在显微镜下更快,但嘿,你要求最快的方式:)

I would almost certainly actually express it as

我几乎肯定会把它表达为

public static int CountBitsAfterXor(IEnumerable<byte> data)

in real life, but see which works better for you.

在现实生活中,但看看哪个更适合你。

Also note the type of the xor variable as an int. In fact, there's no XOR operator defined for byte values, and if you made xor a byte it would still compile due to the nature of compound assignment operators, but it would be performing a cast on each iteration - at least in the IL. It's quite possible that the JIT would take care of this, but there's no need to even ask it to :)

还要注意xor变量的类型为int。事实上,没有为字节值定义XOR运算符,并且如果你使xor成为一个字节,由于复合赋值运算符的性质,它仍然会编译,但它会在每次迭代时执行强制转换 - 至少在IL中。 JIT很可能会解决这个问题,但是甚至没有必要要求它:)

#2


9  

Fastest way would probably be a 256-element lookup table...

最快的方式可能是一个256元素的查找表......

int[] lut
{
    /*0x00*/ 0,
    /*0x01*/ 1,
    /*0x02*/ 1,
    /*0x03*/ 2
    ...
    /*0xFE*/ 7,
    /*0xFF*/ 8
}

e.g.

例如

11110000^01010101 = 10100101 -> lut[165] == 4

#3


5  

This is more commonly referred to as bit counting. There are literally dozens of different algorithms for doing this. Here is one site which lists a few of the more well known methods. There are even CPU specific instructions for doing this.

这通常被称为比特计数。实际上有几十种不同的算法。这是一个列出一些更为人熟知的方法的站点。甚至还有CPU特定的指令来执行此操作。

Theorectically, Microsoft could add a BitArray.CountSetBits function that gets JITed with the best algorithm for that CPU architecture. I, for one, would welcome such an addition.

从理论上讲,Microsoft可以添加一个BitArray.CountSetBits函数,该函数使用该CPU架构的最佳算法进行JITed。我个人会欢迎这样的补充。

#4


3  

As I understood it you want to sum the bits of each XOR between the left and right bytes.

据我所知,你想要在左右字节之间对每个XOR的位求和。

for (int b = 0; b < left.Length; b++) {
  int num = left[b] ^ right[b];
  int sum = 0;

  for (int i = 0; i < 8; i++) {
    sum += (num >> i) & 1;
  }

   // do something with sum maybe?
}

#5


2  

I'm not sure if you mean sum the bytes or the bits. To sum the bits within a byte, this should work:

我不确定你的意思是总和字节还是比特。要对一个字节内的位求和,这应该有效:

int nSum = 0;
for (int i=0; i<=7; i++)
{
   nSum += (byte_val>>i) & 1;
}

You would then need the xoring, and array looping around this, of course.

然后,你需要xoring,并且当然要围绕它进行数组循环。

#6


1  

The following should do

以下应该做

int BitXorAndSum(byte[] left, byte[] right) {
  int sum = 0;
  for ( var i = 0; i < left.Length; i++) { 
    sum += SumBits((byte)(left[i] ^ right[i]));
  }
  return sum;
}

int SumBits(byte b) {
  var sum = 0;
  for (var i = 0; i < 8; i++) {
    sum += (0x1) & (b >> i);
  }
  return sum;
}

#7


1  

It can be rewritten as ulong and use unsafe pointer, but byte is easier to understand:

它可以重写为ulong并使用不安全的指针,但字节更容易理解:

static int BitCount(byte num)
{
    // 0x5 = 0101 (bit) 0x55 = 01010101
    // 0x3 = 0011 (bit) 0x33 = 00110011
    // 0xF = 1111 (bit) 0x0F = 00001111
    uint count = num;
    count = ((count >> 1) & 0x55) + (count & 0x55);
    count = ((count >> 2) & 0x33) + (count & 0x33);
    count = ((count >> 4) & 0xF0) + (count & 0x0F);
    return (int)count;
}

#8


0  

A general function to count bits could look like:

计算位的一般函数可能如下所示:

int Count1(byte[] a)
{
  int count = 0;
  for (int i = 0; i < a.Length; i++)
  {
    byte b = a[i];
    while (b != 0)
    {
      count++;
      b = (byte)((int)b & (int)(b - 1));
    }
  }
  return count;
}

The less 1-bits, the faster this works. It simply loops over each byte, and toggles the lowest 1 bit of that byte until the byte becomes 0. The castings are necessary so that the compiler stops complaining about the type widening and narrowing.

1位越少,效果越快。它只是循环遍历每个字节,并切换该字节的最低1位,直到字节变为0.必须使用强制转换,以便编译器停止抱怨类型扩展和缩小。

Your problem could then be solved by using this:

然后可以使用以下方法解决您的问题:

int Count1Xor(byte[] a1, byte[] a2)
{
  int count = 0;
  for (int i = 0; i < Math.Min(a1.Length, a2.Length); i++)
  {
    byte b = (byte)((int)a1[i] ^ (int)a2[i]);
    while (b != 0)
    {
      count++;
      b = (byte)((int)b & (int)(b - 1));
    }
  }
  return count;
}

#9


0  

A lookup table should be the fastest, but if you want to do it without a lookup table, this will work for bytes in just 10 operations.

查找表应该是最快的,但是如果你想在没有查找表的情况下这样做,这将仅适用于10个操作中的字节。

public static int BitCount(byte value) {
    int v = value - ((value >> 1) & 0x55);
    v = (v & 0x33) + ((v >> 2) & 0x33);
    return ((v + (v >> 4) & 0x0F));
}

This is a byte version of the general bit counting function described at Sean Eron Anderson's bit fiddling site.

这是Sean Eron Anderson位小巧的网站上描述的通用比特计数功能的字节版本。