C/ c++位数组或位向量。

时间:2021-03-02 16:36:30

I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -

我正在学习C/ c++编程&遇到了“位数组”或“位向量”的用法。我不能理解他们的目的吗?这是我的疑惑。

  1. Are they used as boolean flags?
  2. 它们是否用作布尔标志?
  3. Can one use int arrays instead? (more memory of course, but..)
  4. 可以使用int数组吗?(当然是更多的记忆,但是……)
  5. What's this concept of Bit-Masking?
  6. 什么是位掩蔽的概念?
  7. If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
  8. 如果位掩码是简单的位操作来获取一个适当的标志,那么如何为它们执行一个程序呢?做这个操作是否不困难,我们可以看到,作为十进制数字的标志是什么?

I am looking for applications, so that I can understand better. for Eg -

我正在寻找应用程序,以便我能更好地理解。如-

Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing numbers?

问:给定一个包含整数的文件(1到100万)。有一些重复,因此一些数字不见了。找到寻找丢失号码的最快方法?

For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?

对于上述问题,我已经阅读了一些解决方案,告诉我使用位数组。如何将每个整数存储在一个比特中?

5 个解决方案

#1


13  

I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.

我想你已经把数组和数字搞混了,特别是对二进制数字的操作。

I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?

我举个例子。假设您有许多错误消息,您希望从函数返回值。现在,你可以给你的错误标上1、2、3、4……这对你的思想来说是有意义的,但是,如果只给出一个数字,你如何计算出发生了哪些错误呢?

Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.

现在,试着标出误差1 2 4 8 16…增加两倍的力量。为什么这个工作吗?当你工作的时候,你在操纵一个数字,比如00000000,每个数字对应的是2乘以它从右边的位置。假设误差是1 4 8。那么这可以表示为00001101。反过来说,第一位= 1 * 2 ^ 0,第三个数字1 * 2 ^ 2第四位1 * 2 ^ 3。把它们加起来就等于13。

Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:

现在,我们可以通过应用位掩码来测试是否发生了这样的错误。例如,如果您想计算出错误8是否发生,请使用8 = 00001000的位表示。现在,为了提取这个错误是否发生了,请使用二进制,如下所示:

  00001101
& 00001000
= 00001000

I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.

我确定你知道a和它的工作原理,或者可以从上面的数字中推导出来,如果两个数字都是1,那么结果就是1,否则就是0。

Now, in C:

现在,在C:

int func(...)
{
    int retval = 0;

    if ( sometestthatmeans an error )
    {
        retval += 1;
    }


    if ( sometestthatmeans an error )
    {
        retval += 2;
    }
    return retval
}

int anotherfunc(...)
{
    uint8_t x = func(...)

    /* binary and with 8 and shift 3 plaes to the right
     * so that the resultant expression is either 1 or 0 */
    if ( ( ( x & 0x08 ) >> 3 ) == 1 )
    {
        /* that error occurred */
    }
}

Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.

现在,实用性。当内存稀疏,协议没有冗长的xml等特权时,将字段分隔为这么多字节是很常见的。在该字段中,您将不同的位(标志、2的幂)赋给特定的含义,并应用二元运算来推断它们是否被设置,然后对它们进行操作。

I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.

我还应该补充的是,二进制操作与计算机的底层电子技术密切相关。想象一下,如果位域与各种电路的输出相对应(带电流或不带电流)。通过使用上述电路的足够组合,你可以……一台电脑。

#2


8  

regarding the usage the bits array :

关于使用位数组:

if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).

如果你知道“只有”100万的数字——你使用的是100万比特的数组。在开始的时候,所有的比特都是0,并且每次你读一个数字——用这个数字作为索引,并把这个索引中的位改变为1(如果不是一个的话)。

after reading all numbers - the missing numbers are the indices of the zeros in the array.

在读取所有的数字之后,丢失的数字是数组中0的索引。

for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0. if we read the numbers : 3, 2, 2 the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers

例如,如果我们只有0到4之间的数字,数组在开始时是这样的:0 0 0 0 0。如果我们读一下数字:3,2,2,数组看起来是这样的:读3——> 0 0 0 1 0。读3(再次)——> 0 0 0 0。读2——> 0 0 1 1 0。检查零点的指数:0,1,4 -这些是缺失的数字。

BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory

当然,你可以使用整数而不是位元,但它可能需要(取决于系统)32倍的内存。

Sivan

息汪月

#3


3  

Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed. This gets handy if you have lots of such data so you save memory at large.

位数组或位向量可以作为布尔值的数组。通常一个布尔变量需要至少一个字节存储,但在位数组/向量中只需要一点。如果你有很多这样的数据,这样你就可以节省大量的内存。

Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.

另一种用法是,如果你有一些不完全符合标准变量的数字,比如8、16、32或64位。你可以这样存储一个16位的位向量它包含4位,一个是2位,一个是10位大小。通常情况下,你需要使用3个变量,大小为8、8和16位,所以你只浪费了50%的存储空间。

But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.

但是,在业务应用程序中很少使用所有这些用途,在通过pinvoke/interop函数和进行低级别编程时,常常会出现这种情况。

#4


3  

Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.

位向量的位数组被用作从位置到一些位值的映射。是的,它基本上和Bool数组一样,但是典型的Bool实现是1到4个字节,它占用了太多的空间。

We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.

我们可以更有效地存储相同数量的数据,通过使用单词数组和二进制掩蔽操作和转换来存储和检索它们(使用较少的总内存、更少的内存访问、更少的缓存丢失、更少的内存分页交换)。访问单个位的代码仍然非常简单。

There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).

在C语言中也有一些字段支持builtin(你写的东西像int i:1;说“只消耗一比特”),但是它不能用于数组,并且您对整个结果的控制更少(实现的细节依赖于编译器和对齐问题)。

Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.

下面是一个可能的方法来回答你的“搜索失踪数字”问题。我将int大小固定为32位,以使事情变得简单,但是可以使用sizeof(int)来编写它,使之便于移植。并且(取决于编译器和目标处理器)代码只能使用>> 5代替/ 32和& 31而不是% 32,但这只是为了给出这个想法。

#include <stdio.h>
#include <errno.h>
#include <stdint.h>

int main(){
    /* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
    {
        printf("writing test file\n");
        int x = 0;
        FILE * f = fopen("testfile.txt", "w");
        for (x=0; x < 1000000; ++x){
            if (x == 765 || x == 777760 || x == 777791){
                continue;
            }
            fprintf(f, "%d\n", x);
        }
        fprintf(f, "%d\n", 57768); /* this one is a duplicate */
        fclose(f);
    }

    uint32_t bitarray[1000000 / 32];

    /* read file containing integers in the range [1,1000000] */
    /* any non number is considered as separator */
    /* the goal is to find missing numbers */
    printf("Reading test file\n");
    {
        unsigned int x = 0;
        FILE * f = fopen("testfile.txt", "r");
        while (1 == fscanf(f, " %u",&x)){
            bitarray[x / 32] |= 1 << (x % 32);
        }
        fclose(f);
    }
    /* find missing number in bitarray */
    {
        int x = 0;
        for (x=0; x < (1000000 / 32) ; ++x){
            int n = bitarray[x];
            if (n != (uint32_t)-1){
                printf("Missing number(s) between %d and %d [%x]\n",
                    x * 32, (x+1) * 32, bitarray[x]);
                int b;
                for (b = 0 ; b < 32 ; ++b){
                    if (0 == (n & (1 << b))){
                        printf("missing number is %d\n", x*32+b);
                    }
                }
            }
        }
    }
}

#5


2  

That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.

它用于位标记存储,以及解析不同的二进制协议字段,其中1个字节被划分为多个位域。这在TCP/IP等协议中得到了广泛的应用,包括ASN.1编码、OpenPGP包等等。

#1


13  

I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.

我想你已经把数组和数字搞混了,特别是对二进制数字的操作。

I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?

我举个例子。假设您有许多错误消息,您希望从函数返回值。现在,你可以给你的错误标上1、2、3、4……这对你的思想来说是有意义的,但是,如果只给出一个数字,你如何计算出发生了哪些错误呢?

Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.

现在,试着标出误差1 2 4 8 16…增加两倍的力量。为什么这个工作吗?当你工作的时候,你在操纵一个数字,比如00000000,每个数字对应的是2乘以它从右边的位置。假设误差是1 4 8。那么这可以表示为00001101。反过来说,第一位= 1 * 2 ^ 0,第三个数字1 * 2 ^ 2第四位1 * 2 ^ 3。把它们加起来就等于13。

Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:

现在,我们可以通过应用位掩码来测试是否发生了这样的错误。例如,如果您想计算出错误8是否发生,请使用8 = 00001000的位表示。现在,为了提取这个错误是否发生了,请使用二进制,如下所示:

  00001101
& 00001000
= 00001000

I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.

我确定你知道a和它的工作原理,或者可以从上面的数字中推导出来,如果两个数字都是1,那么结果就是1,否则就是0。

Now, in C:

现在,在C:

int func(...)
{
    int retval = 0;

    if ( sometestthatmeans an error )
    {
        retval += 1;
    }


    if ( sometestthatmeans an error )
    {
        retval += 2;
    }
    return retval
}

int anotherfunc(...)
{
    uint8_t x = func(...)

    /* binary and with 8 and shift 3 plaes to the right
     * so that the resultant expression is either 1 or 0 */
    if ( ( ( x & 0x08 ) >> 3 ) == 1 )
    {
        /* that error occurred */
    }
}

Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.

现在,实用性。当内存稀疏,协议没有冗长的xml等特权时,将字段分隔为这么多字节是很常见的。在该字段中,您将不同的位(标志、2的幂)赋给特定的含义,并应用二元运算来推断它们是否被设置,然后对它们进行操作。

I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.

我还应该补充的是,二进制操作与计算机的底层电子技术密切相关。想象一下,如果位域与各种电路的输出相对应(带电流或不带电流)。通过使用上述电路的足够组合,你可以……一台电脑。

#2


8  

regarding the usage the bits array :

关于使用位数组:

if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).

如果你知道“只有”100万的数字——你使用的是100万比特的数组。在开始的时候,所有的比特都是0,并且每次你读一个数字——用这个数字作为索引,并把这个索引中的位改变为1(如果不是一个的话)。

after reading all numbers - the missing numbers are the indices of the zeros in the array.

在读取所有的数字之后,丢失的数字是数组中0的索引。

for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0. if we read the numbers : 3, 2, 2 the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers

例如,如果我们只有0到4之间的数字,数组在开始时是这样的:0 0 0 0 0。如果我们读一下数字:3,2,2,数组看起来是这样的:读3——> 0 0 0 1 0。读3(再次)——> 0 0 0 0。读2——> 0 0 1 1 0。检查零点的指数:0,1,4 -这些是缺失的数字。

BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory

当然,你可以使用整数而不是位元,但它可能需要(取决于系统)32倍的内存。

Sivan

息汪月

#3


3  

Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed. This gets handy if you have lots of such data so you save memory at large.

位数组或位向量可以作为布尔值的数组。通常一个布尔变量需要至少一个字节存储,但在位数组/向量中只需要一点。如果你有很多这样的数据,这样你就可以节省大量的内存。

Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.

另一种用法是,如果你有一些不完全符合标准变量的数字,比如8、16、32或64位。你可以这样存储一个16位的位向量它包含4位,一个是2位,一个是10位大小。通常情况下,你需要使用3个变量,大小为8、8和16位,所以你只浪费了50%的存储空间。

But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.

但是,在业务应用程序中很少使用所有这些用途,在通过pinvoke/interop函数和进行低级别编程时,常常会出现这种情况。

#4


3  

Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.

位向量的位数组被用作从位置到一些位值的映射。是的,它基本上和Bool数组一样,但是典型的Bool实现是1到4个字节,它占用了太多的空间。

We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.

我们可以更有效地存储相同数量的数据,通过使用单词数组和二进制掩蔽操作和转换来存储和检索它们(使用较少的总内存、更少的内存访问、更少的缓存丢失、更少的内存分页交换)。访问单个位的代码仍然非常简单。

There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).

在C语言中也有一些字段支持builtin(你写的东西像int i:1;说“只消耗一比特”),但是它不能用于数组,并且您对整个结果的控制更少(实现的细节依赖于编译器和对齐问题)。

Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.

下面是一个可能的方法来回答你的“搜索失踪数字”问题。我将int大小固定为32位,以使事情变得简单,但是可以使用sizeof(int)来编写它,使之便于移植。并且(取决于编译器和目标处理器)代码只能使用>> 5代替/ 32和& 31而不是% 32,但这只是为了给出这个想法。

#include <stdio.h>
#include <errno.h>
#include <stdint.h>

int main(){
    /* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
    {
        printf("writing test file\n");
        int x = 0;
        FILE * f = fopen("testfile.txt", "w");
        for (x=0; x < 1000000; ++x){
            if (x == 765 || x == 777760 || x == 777791){
                continue;
            }
            fprintf(f, "%d\n", x);
        }
        fprintf(f, "%d\n", 57768); /* this one is a duplicate */
        fclose(f);
    }

    uint32_t bitarray[1000000 / 32];

    /* read file containing integers in the range [1,1000000] */
    /* any non number is considered as separator */
    /* the goal is to find missing numbers */
    printf("Reading test file\n");
    {
        unsigned int x = 0;
        FILE * f = fopen("testfile.txt", "r");
        while (1 == fscanf(f, " %u",&x)){
            bitarray[x / 32] |= 1 << (x % 32);
        }
        fclose(f);
    }
    /* find missing number in bitarray */
    {
        int x = 0;
        for (x=0; x < (1000000 / 32) ; ++x){
            int n = bitarray[x];
            if (n != (uint32_t)-1){
                printf("Missing number(s) between %d and %d [%x]\n",
                    x * 32, (x+1) * 32, bitarray[x]);
                int b;
                for (b = 0 ; b < 32 ; ++b){
                    if (0 == (n & (1 << b))){
                        printf("missing number is %d\n", x*32+b);
                    }
                }
            }
        }
    }
}

#5


2  

That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.

它用于位标记存储,以及解析不同的二进制协议字段,其中1个字节被划分为多个位域。这在TCP/IP等协议中得到了广泛的应用,包括ASN.1编码、OpenPGP包等等。