What is more efficient in terms of memory and CPU usage — an array of boolean
s or a BitSet? Specific BitSet methods are not used, only get/set/clear (==, =, Arrays.fill respectively for an array).
在内存和CPU使用方面,什么更有效?是布尔数组还是位集?不使用特定的位集方法,只获取/设置/清除(==,=,=,数组)。分别填充一个数组)。
7 个解决方案
#1
33
From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):
在Sun JDK 1.6的一些基准测试中,使用筛子计算质数(最好是10次迭代来热身,给JIT编译器一个机会,排除随机调度延迟,Core 2 Duo T5600 1.83GHz):
BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.
BitSet比布尔[]的内存效率更高,除了非常小的大小。数组中的每个布尔值取一个字节。运行时. freememory()的数据对于位集来说有点混乱,但更少。
boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.
布尔值[]除了非常大的尺寸(它们差不多是偶数)外,CPU效率更高。例如,对于100万布尔值[],它的速度大约是10亿布尔值的4倍(例如,6ms vs 27ms),而对于1000万布尔值[],它的速度大约是10亿布尔值的4倍。
#2
36
-
Boolean[]
uses about 4-20 bytes per boolean value. - 布尔值[]每布尔值使用4-20字节。
-
boolean[]
uses about 1 byte per boolean value. - 布尔[]使用大约1字节的布尔值。
-
BitSet
uses about 1 bit per boolean value. - 位集每布尔值使用1位。
Memory size might not be an issue for you in which case boolean[] might be simpler to code.
内存大小对您来说可能不是问题,在这种情况下布尔[]可能更容易编码。
#3
4
A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example, 00000001
could be squeezed down by frequency to something equivalent to {(7)0, (1)1}
. A more "randomized" string 00111010
would require a more complex representation, e.g. {(2)0, (3)1, (1)0, (1)1, (1)0}
, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyond BitSet
.
您的问题有点左行,但是如果需要关注存储,您可能需要研究Huffman压缩。例如,00000001可以被频率压缩到相当于{(7)0,(1)1}的东西。更“随机化”的字符串00111010需要更复杂的表示,例如{(2)0,(3)1,(1)0,(1)1,(1)0},并占用更多的空间。根据您的位数据的结构,您可能会从它的使用中获得一些存储的好处,除了BitSet之外。
#4
4
It depends as always. Yes BitSet is more memory efficent, but as soon as you require multithreaded access boolean[] might be the better choice. For example for computing primes you only set the boolean to true and therefore you don't really need synchronization. Hans Boehm has written some paper about this and the same technique can be used for marking nodes in graph.
这取决于一如既往。是的,位集是更有效的内存,但是一旦您需要多线程访问布尔[]可能是更好的选择。例如,在计算启动时,你只将布尔值设为true,因此你不需要同步。Hans Boehm已经就此写了一些论文,同样的技术也可以用于在图中标记节点。
#5
3
As for memory, the documentation for a BitSet
has pretty clear implications. In particular:
至于内存,位集的文档有非常明确的含义。特别是:
Every bit set has a current size, which is the number of bits of space currently in use by the bit set. Note that the size is related to the implementation of a bit set, so it may change with implementation. The length of a bit set relates to logical length of a bit set and is defined independently of implementation.
每个位集都有一个当前的大小,这是位集当前使用的位的数量。位集的长度与位集的逻辑长度有关,并且是独立于实现而定义的。
The source for Java library classes is openly available and one can easily check this for themselves. In particular:
Java库类的源代码是开放的,您可以很容易地自己检查它。特别是:
The internal field corresponding to the serialField "bits".
89
90 private long[] words;
As for speed; it depends on what one is doing. In general, don't think about speed ahead of time; use whichever tool makes the most sense semantically and leads to the clearest code. Optimize only after observing that performance requirements aren't met and identifying bottlenecks.
至于速度;这取决于你在做什么。一般来说,不要提前考虑速度;使用任何语义上最有意义的工具,并产生最清晰的代码。仅在观察到性能需求没有满足和识别瓶颈之后才进行优化。
Coming to SO and asking if A is faster than B is silly for many reasons, including but certainly not limited to:
问A是否比B快是愚蠢的,原因有很多,包括但不限于:
- It depends on the application, which nobody responding generally has access to. Analyze and profile it in the context it is being used in. Be sure that it's a bottleneck that's actually worth optimizing.
- 这取决于应用程序,没有人响应通常能够访问。分析并在它正在使用的上下文中描述它。确保这是一个真正值得优化的瓶颈。
- Questions like this that ask about speed generally show that the OP thinks they care about efficiency but wasn't willing to profile and didn't define performance requirements. Under the surface, that's usually a red flag that the OP is headed down the wrong path.
- 诸如此类关于速度的问题通常表明,OP认为他们关心效率,但不愿意描述,也不定义性能需求。从表面上看,这通常是一个危险信号,说明OP正在沿着错误的路径前进。
I know this is an old question but it came up recently; and I believe this is worth adding.
我知道这是一个老问题,但它最近出现了;我相信这是值得补充的。
#6
1
Going from Java to CPU is totally VM specific. For instance, it used to be that a boolean was actually implemented as a 32-bit value (quite probably is true to this day).
从Java到CPU是完全特定于VM的。例如,一个布尔值实际上被实现为一个32位的值(很可能直到今天仍然是这样)。
Unless you know it is going to matter you are better off writing the code to be clear, profile it, and then fix the parts that are slow or consuming a lot of memory.
除非你知道这很重要,否则你最好把代码写清楚,对它进行剖析,然后修复那些慢的部分或者消耗大量内存的部分。
You can do this as you go. For instance I once decided to not call .intern() on Strings because when I ran the code in the profiler it slowed it down too much (despite using less memory).
你可以边做边做。例如,我曾经决定在字符串中不调用.intern(),因为当我在分析器中运行代码时,它会使它慢得多(尽管使用的内存较少)。
#7
-1
I believe that a BitSet is more memory- and CPU-efficient, is it can internally pack the bits into int, longs, or native data types, whereas a boolean[] requires a byte for each bit of data. Additionally, if you were to use the other methods (and, or, etc), you would find that the BitSet is more efficient, as there is no need to iterate through every element of an array; bitwise math is used instead.
我认为位集具有更高的内存效率和cpu效率,因为它可以在内部将位打包为int、long、或本机数据类型,而boolean[]则需要为每个数据位设置一个字节。另外,如果您要使用其他方法(and, or,等等),您会发现位集更有效,因为不需要遍历数组的每个元素;用位数学代替。
#1
33
From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):
在Sun JDK 1.6的一些基准测试中,使用筛子计算质数(最好是10次迭代来热身,给JIT编译器一个机会,排除随机调度延迟,Core 2 Duo T5600 1.83GHz):
BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.
BitSet比布尔[]的内存效率更高,除了非常小的大小。数组中的每个布尔值取一个字节。运行时. freememory()的数据对于位集来说有点混乱,但更少。
boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.
布尔值[]除了非常大的尺寸(它们差不多是偶数)外,CPU效率更高。例如,对于100万布尔值[],它的速度大约是10亿布尔值的4倍(例如,6ms vs 27ms),而对于1000万布尔值[],它的速度大约是10亿布尔值的4倍。
#2
36
-
Boolean[]
uses about 4-20 bytes per boolean value. - 布尔值[]每布尔值使用4-20字节。
-
boolean[]
uses about 1 byte per boolean value. - 布尔[]使用大约1字节的布尔值。
-
BitSet
uses about 1 bit per boolean value. - 位集每布尔值使用1位。
Memory size might not be an issue for you in which case boolean[] might be simpler to code.
内存大小对您来说可能不是问题,在这种情况下布尔[]可能更容易编码。
#3
4
A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example, 00000001
could be squeezed down by frequency to something equivalent to {(7)0, (1)1}
. A more "randomized" string 00111010
would require a more complex representation, e.g. {(2)0, (3)1, (1)0, (1)1, (1)0}
, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyond BitSet
.
您的问题有点左行,但是如果需要关注存储,您可能需要研究Huffman压缩。例如,00000001可以被频率压缩到相当于{(7)0,(1)1}的东西。更“随机化”的字符串00111010需要更复杂的表示,例如{(2)0,(3)1,(1)0,(1)1,(1)0},并占用更多的空间。根据您的位数据的结构,您可能会从它的使用中获得一些存储的好处,除了BitSet之外。
#4
4
It depends as always. Yes BitSet is more memory efficent, but as soon as you require multithreaded access boolean[] might be the better choice. For example for computing primes you only set the boolean to true and therefore you don't really need synchronization. Hans Boehm has written some paper about this and the same technique can be used for marking nodes in graph.
这取决于一如既往。是的,位集是更有效的内存,但是一旦您需要多线程访问布尔[]可能是更好的选择。例如,在计算启动时,你只将布尔值设为true,因此你不需要同步。Hans Boehm已经就此写了一些论文,同样的技术也可以用于在图中标记节点。
#5
3
As for memory, the documentation for a BitSet
has pretty clear implications. In particular:
至于内存,位集的文档有非常明确的含义。特别是:
Every bit set has a current size, which is the number of bits of space currently in use by the bit set. Note that the size is related to the implementation of a bit set, so it may change with implementation. The length of a bit set relates to logical length of a bit set and is defined independently of implementation.
每个位集都有一个当前的大小,这是位集当前使用的位的数量。位集的长度与位集的逻辑长度有关,并且是独立于实现而定义的。
The source for Java library classes is openly available and one can easily check this for themselves. In particular:
Java库类的源代码是开放的,您可以很容易地自己检查它。特别是:
The internal field corresponding to the serialField "bits".
89
90 private long[] words;
As for speed; it depends on what one is doing. In general, don't think about speed ahead of time; use whichever tool makes the most sense semantically and leads to the clearest code. Optimize only after observing that performance requirements aren't met and identifying bottlenecks.
至于速度;这取决于你在做什么。一般来说,不要提前考虑速度;使用任何语义上最有意义的工具,并产生最清晰的代码。仅在观察到性能需求没有满足和识别瓶颈之后才进行优化。
Coming to SO and asking if A is faster than B is silly for many reasons, including but certainly not limited to:
问A是否比B快是愚蠢的,原因有很多,包括但不限于:
- It depends on the application, which nobody responding generally has access to. Analyze and profile it in the context it is being used in. Be sure that it's a bottleneck that's actually worth optimizing.
- 这取决于应用程序,没有人响应通常能够访问。分析并在它正在使用的上下文中描述它。确保这是一个真正值得优化的瓶颈。
- Questions like this that ask about speed generally show that the OP thinks they care about efficiency but wasn't willing to profile and didn't define performance requirements. Under the surface, that's usually a red flag that the OP is headed down the wrong path.
- 诸如此类关于速度的问题通常表明,OP认为他们关心效率,但不愿意描述,也不定义性能需求。从表面上看,这通常是一个危险信号,说明OP正在沿着错误的路径前进。
I know this is an old question but it came up recently; and I believe this is worth adding.
我知道这是一个老问题,但它最近出现了;我相信这是值得补充的。
#6
1
Going from Java to CPU is totally VM specific. For instance, it used to be that a boolean was actually implemented as a 32-bit value (quite probably is true to this day).
从Java到CPU是完全特定于VM的。例如,一个布尔值实际上被实现为一个32位的值(很可能直到今天仍然是这样)。
Unless you know it is going to matter you are better off writing the code to be clear, profile it, and then fix the parts that are slow or consuming a lot of memory.
除非你知道这很重要,否则你最好把代码写清楚,对它进行剖析,然后修复那些慢的部分或者消耗大量内存的部分。
You can do this as you go. For instance I once decided to not call .intern() on Strings because when I ran the code in the profiler it slowed it down too much (despite using less memory).
你可以边做边做。例如,我曾经决定在字符串中不调用.intern(),因为当我在分析器中运行代码时,它会使它慢得多(尽管使用的内存较少)。
#7
-1
I believe that a BitSet is more memory- and CPU-efficient, is it can internally pack the bits into int, longs, or native data types, whereas a boolean[] requires a byte for each bit of data. Additionally, if you were to use the other methods (and, or, etc), you would find that the BitSet is more efficient, as there is no need to iterate through every element of an array; bitwise math is used instead.
我认为位集具有更高的内存效率和cpu效率,因为它可以在内部将位打包为int、long、或本机数据类型,而boolean[]则需要为每个数据位设置一个字节。另外,如果您要使用其他方法(and, or,等等),您会发现位集更有效,因为不需要遍历数组的每个元素;用位数学代替。