查找int数组是否包含数字的最快方法

时间:2021-04-14 01:35:08

This is an odd question. I have an integer array in Java, where each int represents a color. They will either be 0xFFFFFFFF or 0x0. What would be the FASTEST way to find if this array contains ANY values equal to 0xFFFFFFFF?

这是一个奇怪的问题。我在Java中有一个整数数组,其中每个int代表一种颜色。它们将是0xFFFFFFFF或0x0。如果此数组包含任何等于0xFFFFFFFF的值,那么最快的方法是什么?

This is my current code:

这是我目前的代码:

int length = w * h;
for (int i = 0; i < length; i++) {
    if (pixels[i] == 0xFFFFFFFF) {
        return true;
    }
}

I have no clue if there is a faster way to do this or not. I imagine you vets could have a trick or two though.

我不知道是否有更快的方法来做到这一点。我想你的兽医可能会有一两招。

EDIT: Seeing as it is just a dumb array of pixels from Bitmap.getPixels(), there's no way it would be sorted or transformed to another storage structure. Thanks for the input, everyone, it seems like looping through is the best way in this case.

编辑:看起来它只是来自Bitmap.getPixels()的一个愚蠢的像素数组,它无法被排序或转换为另一个存储结构。感谢大家的投入,看起来循环是这种情况下的最佳方式。

9 个解决方案

#1


11  

No, there is no faster way unless the array of integers is already sorted, which I doubt given it's an array of colours.

不,没有更快的方法,除非整数数组已经排序,我怀疑它是一个颜色数组。

To scan through an unsorted array takes linear time "O(n)". That's what you do, and you exit the method as soon as a match is found which is good too.

扫描未排序的数组需要线性时间“O(n)”。这就是你所做的,一旦找到匹配就退出方法,这也很好。

#2


11  

Without switching to some other data structure, no, there is no better way to find whether the array contains that value. You have to look at all the array elements to see if it's there, since if you don't check some particular location you might miss the one copy of that pixel color.

没有切换到其他一些数据结构,没有,没有更好的方法来查找数组是否包含该值。您必须查看所有数组元素以查看它是否存在,因为如果您不检查某个特定位置,则可能会错过该像素颜色的一个副本。

That said, there are alternative ways that you could solve this problem. Here are a few thoughts on how to speed this up:

也就是说,有其他方法可以解决这个问题。以下是关于如何提高速度的一些想法:

  • If every value is guaranteed to be either white or black, you could store two extra boolean values alongside the array representing whether there are white or black pixels. That way, once you've run the scan once, you could just read the booleans back. You could also store a count of the number of white and black pixels along with the array, and then whenever you write a pixel update the count by decrementing the number of pixels of the original color and incrementing the number of pixels of the new color. This would then give you the ability to check if a pixel of a given color exists in O(1) by just seeing if the correct counter is nonzero.

    如果保证每个值都是白色或黑色,则可以在数组旁边存储两个额外的布尔值,表示是否有白色或黑色像素。这样,一旦你运行扫描一次,你就可以回读一下布尔。您还可以存储白色和黑色像素的数量以及数组,然后每当您编写像素时,通过递减原始颜色的像素数并增加新颜色的像素数来更新计数。这样,您就可以通过查看正确的计数器是否为非零来检查O(1)中是否存在给定颜色的像素。

  • Alternatively, if you happen to know something about the image (perhaps where the white and black pixels ought to be), you could consider doing the iteration in a different order. For example, if the pixels you're looking for tend to be clustered in the center of the image, rewriting the loop to check there first might be a good idea since if there are any pixels of that type you'll find them more rapidly. This still has the same worst-case behavior, but for "realistic" images might be much faster.

    或者,如果您碰巧知道某些图像(可能是白色和黑色像素应该在哪里),您可以考虑以不同的顺序进行迭代。例如,如果你正在寻找的像素倾向于聚集在图像的中心,重写循环以检查那里首先可能是一个好主意,因为如果有任何类型的像素你会更快地找到它们。这仍然具有相同的最坏情况行为,但对于“逼真”的图像可能会快得多。

  • If you have multiple threads available and the array is really huge (millions of elements), you could consider having multiple threads each search a part of the array for the value. This would only be feasible if you had a reason to suspect that most of the image was not white.

    如果你有多个线程可用并且数组非常庞大(数百万个元素),你可以考虑让多个线程分别搜索数组的一部分来获取值。只有当您有理由怀疑大部分图像不是白色时,这才是可行的。

  • Since in most realistic images you might assume that the image is a mixture of colors and you're just looking for something of one color, then you might want to consider storing the image as a sparse array, where you store a list of the pixels that happen to be of one color (say, white) and then assume everything else is black. If you expect most images to be a solid color with a few outliers, this might be a very good representation. Additionally, it would give you constant-time lookup of whether any black or white pixels exist - just check if the list of set pixels is empty or consists of the entire image.

    因为在大多数逼真的图像中,您可能会认为图像是混合颜色而您只是寻找一种颜色的东西,那么您可能需要考虑将图像存储为稀疏数组,其中存储像素列表碰巧是一种颜色(比方说是白色),然后假设其他一切都是黑色的。如果您希望大多数图像是带有少量异常值的纯色,这可能是一个非常好的表示。此外,它可以让您定时查找是否存在任何黑色或白色像素 - 只需检查设置像素列表是否为空或由整个图像组成。

  • If the order doesn't matter, you could also store the elements in some container like a hash table, which could give you O(1) lookup of whether or not the element is there. You could also sort the array and then just check the endpoints.

    如果顺序无关紧要,您还可以将元素存储在某个容器中,如哈希表,这可以让您(O)查询元素是否存在。您还可以对数组进行排序,然后只检查端点。

  • As a microoptimization, you could consider always appending to the real image two values - one white pixel and one black pixel - so that you could always iterate until you find the value. This eliminates one of the comparisons from the loop (the check to see if you're in-bounds) and is recommended by some authors for very large arrays.

    作为微优化,你可以考虑总是在真实图像上附加两个值 - 一个白色像素和一个黑色像素 - 这样你就可以一直迭代直到找到值。这消除了循环中的一个比较(检查你是否在境内),并且一些作者推荐使用非常大的数组。

  • If you assume that most images are a nice mixture of white and black and are okay with getting the wrong answer a small fraction of the time, you could consider probing a few random locations and checking if any of them are the right color. If so, then clearly a pixel of the correct color exists and you're done. Otherwise, run the full linear scan. For images that are a nice blend of colors, this could save you an enormous amount of time, since you could probe some small number of locations (say, O(log n) of them) and end up avoiding a huge linear scan in many cases. This is exponentially faster than before.

    如果你认为大多数图像是白色和黑色的良好混合,并且可以在一小部分时间内得到错误的答案,你可以考虑探测一些随机位置并检查它们中的任何一个是否是正确的颜色。如果是这样,那么显然存在一个正确颜色的像素,你就完成了。否则,运行完整线性扫描。对于颜色很好混合的图像,这可以为您节省大量时间,因为您可以探测少量位置(例如,它们的O(log n))并最终避免在许多位置进行大量线性扫描案例。这比以前快了指数。

  • If every value is either white or black, you could also consider storing the image in a bitvector. This would compress the size of the array by a factor of the machine word size (probably between 32-128x compression) You could then iterate across the compressed array and see if any value is not identically equal to 0 to see if any of the pixels are white. This also saves a huge amount of space, and I'd actually suggest doing this since it makes a lot of other operations easy as well.

    如果每个值都是白色或黑色,您还可以考虑将图像存储在位向量中。这会将数组的大小压缩机器字大小的因子(可能在32-128x压缩之间)然后您可以遍历压缩数组并查看是否有任何值不等于0以查看是否有任何像素是白色的。这也节省了大量的空间,我实际上建议这样做,因为它也使很多其他操作变得容易。

Hope this helps!

希望这可以帮助!

#3


2  

It doesn't matter at the bytecode level, but at the native-code level,

它在字节码级别无关紧要,但在本机代码级别,

if (pixels[i] != 0)

is likely to be a bit faster, given that you're sure only these two values can appear.

考虑到您确定只能出现这两个值,可能会更快一些。

#4


1  

If your array is really big, it might be worth it to divide and conquer. That is, assign segments of the data to multiple threads (probably t threads where t is the number of available processor cores). With a sufficiently large data set, the parallelism may amortize the thread startup cost.

如果你的阵列非常大,那么划分和征服可能是值得的。也就是说,将数据段分配给多个线程(可能是t个线程,其中t是可用处理器核心的数量)。使用足够大的数据集,并行性可以分摊线程启动成本。

#5


1  

Here is the simple optimization that helps on large arrays: put the requested value at the end of the array and thus eliminate array bounds check. (templatetypedef has already mentioned this optimization.) This solution saves 25% of loop running time and it is good for large arrays:

以下是有助于大型数组的简单优化:将请求的值放在数组的末尾,从而消除数组边界检查。 (templatetypedef已经提到过这种优化。)这个解决方案可以节省25%的循环运行时间,适用于大型数组:

tmp = a[n - 1]
a[n - 1] = 0xFFFFFFFF

pos = 0
while a[pos] != 0xFFFFFFFF
    pos = pos + 1

a[n - 1] = tmp

if a[pos] = 0xFFFFFFFF then
    return pos
return -1

There is the C# implementation with running time analysis on this address.

在这个地址上有C#实现和运行时分析。

#6


0  

The only scope for improving the performance is the comparison. I feel bitwise operator would be a bit faster than the conditional operator.
You could do this

改善性能的唯一范围是比较。我觉得按位运算符会比条件运算符快一点。你可以做到这一点

int length = w * h;
for (int i = 0; i < length; i++) {
    if (pixels[i] & 0xFFFFFFFF) {
        return true;
    }
}

#7


0  

Can't you check when you insert the color into the array? If so, you could store the index of the array's element which contains the 0xFFFFFFFF color. Since you want "ANY" entry that has such value, this should do the trick :D

你不能检查何时将颜色插入阵列?如果是这样,您可以存储包含0xFFFFFFFF颜色的数组元素的索引。既然你想要具有这种价值的“任何”条目,这应该可以解决问题:D

If not, your answer has the complexity of O(n) which is the best it could be, since the array isn't (and cannot be, as you say) ordered.

如果没有,你的答案具有O(n)的复杂性,这是最好的,因为数组不是(并且不能如你所说)订购的。

#8


-1  

using the build-in foreach is a tad faster than the indexed for as id eliminates a bound check

使用内置foreach比索引更快,因为id消除了绑定检查

for(int pix:pixels){
    if(pix!=0)
        return true;
}

#9


-1  

Arrays.asList(...).contains(...)

#1


11  

No, there is no faster way unless the array of integers is already sorted, which I doubt given it's an array of colours.

不,没有更快的方法,除非整数数组已经排序,我怀疑它是一个颜色数组。

To scan through an unsorted array takes linear time "O(n)". That's what you do, and you exit the method as soon as a match is found which is good too.

扫描未排序的数组需要线性时间“O(n)”。这就是你所做的,一旦找到匹配就退出方法,这也很好。

#2


11  

Without switching to some other data structure, no, there is no better way to find whether the array contains that value. You have to look at all the array elements to see if it's there, since if you don't check some particular location you might miss the one copy of that pixel color.

没有切换到其他一些数据结构,没有,没有更好的方法来查找数组是否包含该值。您必须查看所有数组元素以查看它是否存在,因为如果您不检查某个特定位置,则可能会错过该像素颜色的一个副本。

That said, there are alternative ways that you could solve this problem. Here are a few thoughts on how to speed this up:

也就是说,有其他方法可以解决这个问题。以下是关于如何提高速度的一些想法:

  • If every value is guaranteed to be either white or black, you could store two extra boolean values alongside the array representing whether there are white or black pixels. That way, once you've run the scan once, you could just read the booleans back. You could also store a count of the number of white and black pixels along with the array, and then whenever you write a pixel update the count by decrementing the number of pixels of the original color and incrementing the number of pixels of the new color. This would then give you the ability to check if a pixel of a given color exists in O(1) by just seeing if the correct counter is nonzero.

    如果保证每个值都是白色或黑色,则可以在数组旁边存储两个额外的布尔值,表示是否有白色或黑色像素。这样,一旦你运行扫描一次,你就可以回读一下布尔。您还可以存储白色和黑色像素的数量以及数组,然后每当您编写像素时,通过递减原始颜色的像素数并增加新颜色的像素数来更新计数。这样,您就可以通过查看正确的计数器是否为非零来检查O(1)中是否存在给定颜色的像素。

  • Alternatively, if you happen to know something about the image (perhaps where the white and black pixels ought to be), you could consider doing the iteration in a different order. For example, if the pixels you're looking for tend to be clustered in the center of the image, rewriting the loop to check there first might be a good idea since if there are any pixels of that type you'll find them more rapidly. This still has the same worst-case behavior, but for "realistic" images might be much faster.

    或者,如果您碰巧知道某些图像(可能是白色和黑色像素应该在哪里),您可以考虑以不同的顺序进行迭代。例如,如果你正在寻找的像素倾向于聚集在图像的中心,重写循环以检查那里首先可能是一个好主意,因为如果有任何类型的像素你会更快地找到它们。这仍然具有相同的最坏情况行为,但对于“逼真”的图像可能会快得多。

  • If you have multiple threads available and the array is really huge (millions of elements), you could consider having multiple threads each search a part of the array for the value. This would only be feasible if you had a reason to suspect that most of the image was not white.

    如果你有多个线程可用并且数组非常庞大(数百万个元素),你可以考虑让多个线程分别搜索数组的一部分来获取值。只有当您有理由怀疑大部分图像不是白色时,这才是可行的。

  • Since in most realistic images you might assume that the image is a mixture of colors and you're just looking for something of one color, then you might want to consider storing the image as a sparse array, where you store a list of the pixels that happen to be of one color (say, white) and then assume everything else is black. If you expect most images to be a solid color with a few outliers, this might be a very good representation. Additionally, it would give you constant-time lookup of whether any black or white pixels exist - just check if the list of set pixels is empty or consists of the entire image.

    因为在大多数逼真的图像中,您可能会认为图像是混合颜色而您只是寻找一种颜色的东西,那么您可能需要考虑将图像存储为稀疏数组,其中存储像素列表碰巧是一种颜色(比方说是白色),然后假设其他一切都是黑色的。如果您希望大多数图像是带有少量异常值的纯色,这可能是一个非常好的表示。此外,它可以让您定时查找是否存在任何黑色或白色像素 - 只需检查设置像素列表是否为空或由整个图像组成。

  • If the order doesn't matter, you could also store the elements in some container like a hash table, which could give you O(1) lookup of whether or not the element is there. You could also sort the array and then just check the endpoints.

    如果顺序无关紧要,您还可以将元素存储在某个容器中,如哈希表,这可以让您(O)查询元素是否存在。您还可以对数组进行排序,然后只检查端点。

  • As a microoptimization, you could consider always appending to the real image two values - one white pixel and one black pixel - so that you could always iterate until you find the value. This eliminates one of the comparisons from the loop (the check to see if you're in-bounds) and is recommended by some authors for very large arrays.

    作为微优化,你可以考虑总是在真实图像上附加两个值 - 一个白色像素和一个黑色像素 - 这样你就可以一直迭代直到找到值。这消除了循环中的一个比较(检查你是否在境内),并且一些作者推荐使用非常大的数组。

  • If you assume that most images are a nice mixture of white and black and are okay with getting the wrong answer a small fraction of the time, you could consider probing a few random locations and checking if any of them are the right color. If so, then clearly a pixel of the correct color exists and you're done. Otherwise, run the full linear scan. For images that are a nice blend of colors, this could save you an enormous amount of time, since you could probe some small number of locations (say, O(log n) of them) and end up avoiding a huge linear scan in many cases. This is exponentially faster than before.

    如果你认为大多数图像是白色和黑色的良好混合,并且可以在一小部分时间内得到错误的答案,你可以考虑探测一些随机位置并检查它们中的任何一个是否是正确的颜色。如果是这样,那么显然存在一个正确颜色的像素,你就完成了。否则,运行完整线性扫描。对于颜色很好混合的图像,这可以为您节省大量时间,因为您可以探测少量位置(例如,它们的O(log n))并最终避免在许多位置进行大量线性扫描案例。这比以前快了指数。

  • If every value is either white or black, you could also consider storing the image in a bitvector. This would compress the size of the array by a factor of the machine word size (probably between 32-128x compression) You could then iterate across the compressed array and see if any value is not identically equal to 0 to see if any of the pixels are white. This also saves a huge amount of space, and I'd actually suggest doing this since it makes a lot of other operations easy as well.

    如果每个值都是白色或黑色,您还可以考虑将图像存储在位向量中。这会将数组的大小压缩机器字大小的因子(可能在32-128x压缩之间)然后您可以遍历压缩数组并查看是否有任何值不等于0以查看是否有任何像素是白色的。这也节省了大量的空间,我实际上建议这样做,因为它也使很多其他操作变得容易。

Hope this helps!

希望这可以帮助!

#3


2  

It doesn't matter at the bytecode level, but at the native-code level,

它在字节码级别无关紧要,但在本机代码级别,

if (pixels[i] != 0)

is likely to be a bit faster, given that you're sure only these two values can appear.

考虑到您确定只能出现这两个值,可能会更快一些。

#4


1  

If your array is really big, it might be worth it to divide and conquer. That is, assign segments of the data to multiple threads (probably t threads where t is the number of available processor cores). With a sufficiently large data set, the parallelism may amortize the thread startup cost.

如果你的阵列非常大,那么划分和征服可能是值得的。也就是说,将数据段分配给多个线程(可能是t个线程,其中t是可用处理器核心的数量)。使用足够大的数据集,并行性可以分摊线程启动成本。

#5


1  

Here is the simple optimization that helps on large arrays: put the requested value at the end of the array and thus eliminate array bounds check. (templatetypedef has already mentioned this optimization.) This solution saves 25% of loop running time and it is good for large arrays:

以下是有助于大型数组的简单优化:将请求的值放在数组的末尾,从而消除数组边界检查。 (templatetypedef已经提到过这种优化。)这个解决方案可以节省25%的循环运行时间,适用于大型数组:

tmp = a[n - 1]
a[n - 1] = 0xFFFFFFFF

pos = 0
while a[pos] != 0xFFFFFFFF
    pos = pos + 1

a[n - 1] = tmp

if a[pos] = 0xFFFFFFFF then
    return pos
return -1

There is the C# implementation with running time analysis on this address.

在这个地址上有C#实现和运行时分析。

#6


0  

The only scope for improving the performance is the comparison. I feel bitwise operator would be a bit faster than the conditional operator.
You could do this

改善性能的唯一范围是比较。我觉得按位运算符会比条件运算符快一点。你可以做到这一点

int length = w * h;
for (int i = 0; i < length; i++) {
    if (pixels[i] & 0xFFFFFFFF) {
        return true;
    }
}

#7


0  

Can't you check when you insert the color into the array? If so, you could store the index of the array's element which contains the 0xFFFFFFFF color. Since you want "ANY" entry that has such value, this should do the trick :D

你不能检查何时将颜色插入阵列?如果是这样,您可以存储包含0xFFFFFFFF颜色的数组元素的索引。既然你想要具有这种价值的“任何”条目,这应该可以解决问题:D

If not, your answer has the complexity of O(n) which is the best it could be, since the array isn't (and cannot be, as you say) ordered.

如果没有,你的答案具有O(n)的复杂性,这是最好的,因为数组不是(并且不能如你所说)订购的。

#8


-1  

using the build-in foreach is a tad faster than the indexed for as id eliminates a bound check

使用内置foreach比索引更快,因为id消除了绑定检查

for(int pix:pixels){
    if(pix!=0)
        return true;
}

#9


-1  

Arrays.asList(...).contains(...)