
时间:2022-09-23 17:13:05

I've got an array of bytes (primitive), they can have random values. I'm trying to count occurrences of them in the array in the most efficient/fastest way. Currently I'm using:


HashMap<Byte, Integer> dataCount = new HashMap<>();
for (byte b : data) dataCount.put(b, dataCount.getOrDefault(b, 0) + 1);

This one-liner takes ~500ms to process a byte[] of length 24883200. Using a regular for loop takes at least 600ms.


I've been thinking of constructing a set (since they only contain one of each element) then adding it to a HashMap using Collections.frequency(), but the methods to construct a Set from primitives require several other calls, so I'm guessing it's not as fast.


What would be the fastest way to accomplish counting of occurrences of each item?


I'm using Java 8 and I'd prefer to avoid using Apache Commons if possible.

我正在使用Java 8,如果可能的话,我宁愿避免使用Apache Commons。

2 个解决方案


If it's just bytes, use an array, don't use a map. You do have to use masking to deal with the signedness of bytes, but that's not a big deal.


int[] counts = new int[256];
for (byte b : data) {
   counts[b & 0xFF]++;

Arrays are just so massively compact and efficient that they're almost impossible to beat when you can use them.



I would create an array instead of a HashMap, given that you know exactly how many counts you need to keep track of:


int[] counts = new int[256];
for (byte b : data) {
    counts[b & 0xff]++;

That way:

  • You never need to do any boxing of either the keys or the values
  • 你永远不需要对键或值进行任何装箱

  • Nothing needs to take a hash code, check for equality etc
  • 没有什么需要采用哈希码,检查相等性等

  • It's about as memory-efficient as it gets
  • 它与内存一样高效

Note that the & 0xff is used to get a value in the range [0, 255] instead of [-128, 127], so it's suitable as the index into the array.



If it's just bytes, use an array, don't use a map. You do have to use masking to deal with the signedness of bytes, but that's not a big deal.


int[] counts = new int[256];
for (byte b : data) {
   counts[b & 0xFF]++;

Arrays are just so massively compact and efficient that they're almost impossible to beat when you can use them.



I would create an array instead of a HashMap, given that you know exactly how many counts you need to keep track of:


int[] counts = new int[256];
for (byte b : data) {
    counts[b & 0xff]++;

That way:

  • You never need to do any boxing of either the keys or the values
  • 你永远不需要对键或值进行任何装箱

  • Nothing needs to take a hash code, check for equality etc
  • 没有什么需要采用哈希码,检查相等性等

  • It's about as memory-efficient as it gets
  • 它与内存一样高效

Note that the & 0xff is used to get a value in the range [0, 255] instead of [-128, 127], so it's suitable as the index into the array.
