I've got an array of bytes (primitive), they can have random values. I'm trying to count occurrences of them in the array in the most efficient/fastest way. Currently I'm using:
我有一个字节数组(原始),它们可以有随机值。我试图以最有效/最快的方式计算它们在数组中的出现次数。目前我正在使用:
HashMap<Byte, Integer> dataCount = new HashMap<>();
for (byte b : data) dataCount.put(b, dataCount.getOrDefault(b, 0) + 1);
This one-liner takes ~500ms to process a byte[] of length 24883200. Using a regular for loop takes at least 600ms.
这个单行程需要大约500ms来处理长度为24883200的字节[]。使用常规for循环至少需要600ms。
I've been thinking of constructing a set (since they only contain one of each element) then adding it to a HashMap using Collections.frequency(), but the methods to construct a Set from primitives require several other calls, so I'm guessing it's not as fast.
我一直在考虑构造一个集合(因为它们只包含每个元素中的一个)然后使用Collections.frequency()将它添加到HashMap中,但是从原语构造Set的方法需要几个其他调用,所以我是猜测它不是那么快。
What would be the fastest way to accomplish counting of occurrences of each item?
完成每个项目发生次数的最快方法是什么?
I'm using Java 8 and I'd prefer to avoid using Apache Commons if possible.
我正在使用Java 8,如果可能的话,我宁愿避免使用Apache Commons。
2 个解决方案
#1
If it's just bytes, use an array, don't use a map. You do have to use masking to deal with the signedness of bytes, but that's not a big deal.
如果它只是字节,请使用数组,不要使用地图。你必须使用掩码来处理字节的签名,但这不是什么大问题。
int[] counts = new int[256];
for (byte b : data) {
counts[b & 0xFF]++;
}
Arrays are just so massively compact and efficient that they're almost impossible to beat when you can use them.
阵列非常紧凑和高效,当你可以使用时几乎不可能击败它们。
#2
I would create an array instead of a HashMap
, given that you know exactly how many counts you need to keep track of:
我会创建一个数组而不是HashMap,因为您确切知道需要跟踪的计数数量:
int[] counts = new int[256];
for (byte b : data) {
counts[b & 0xff]++;
}
That way:
- You never need to do any boxing of either the keys or the values
- Nothing needs to take a hash code, check for equality etc
- It's about as memory-efficient as it gets
你永远不需要对键或值进行任何装箱
没有什么需要采用哈希码,检查相等性等
它与内存一样高效
Note that the & 0xff
is used to get a value in the range [0, 255]
instead of [-128, 127]
, so it's suitable as the index into the array.
请注意,&0xff用于获取[0,255]范围内的值而不是[-128,127],因此它适合作为数组的索引。
#1
If it's just bytes, use an array, don't use a map. You do have to use masking to deal with the signedness of bytes, but that's not a big deal.
如果它只是字节,请使用数组,不要使用地图。你必须使用掩码来处理字节的签名,但这不是什么大问题。
int[] counts = new int[256];
for (byte b : data) {
counts[b & 0xFF]++;
}
Arrays are just so massively compact and efficient that they're almost impossible to beat when you can use them.
阵列非常紧凑和高效,当你可以使用时几乎不可能击败它们。
#2
I would create an array instead of a HashMap
, given that you know exactly how many counts you need to keep track of:
我会创建一个数组而不是HashMap,因为您确切知道需要跟踪的计数数量:
int[] counts = new int[256];
for (byte b : data) {
counts[b & 0xff]++;
}
That way:
- You never need to do any boxing of either the keys or the values
- Nothing needs to take a hash code, check for equality etc
- It's about as memory-efficient as it gets
你永远不需要对键或值进行任何装箱
没有什么需要采用哈希码,检查相等性等
它与内存一样高效
Note that the & 0xff
is used to get a value in the range [0, 255]
instead of [-128, 127]
, so it's suitable as the index into the array.
请注意,&0xff用于获取[0,255]范围内的值而不是[-128,127],因此它适合作为数组的索引。