我可以在子线性时间内找到未排序数组中的最大/最小值吗？

Is it possible? If not, given an array of size n, how do I know if its better to just sort the array?

可能吗?如果没有,给定一个大小为n的数组,我怎么知道如果对数组进行排序更好?

4 个解决方案

#1

With just the unsorted array, there is no way to do this in sub-linear time. Since you don't know which element is the largest and smallest, you have to look at them all, hence linear time.

只有未排序的数组,在子线性时间内无法做到这一点。由于您不知道哪个元素是最大和最小的,因此您必须全部查看它们,因此需要线性时间。

The best sort you'll find will be worse than that, probably relative to n log n so it will be "better" to do the linear scan.

你会发现最好的排序会比这更糟糕,可能相对于n log n,所以进行线性扫描会“更好”。

There are other ways to speed up the process if you're allowed to store more information. You can store the minimum and maximum using the following rules:

如果您允许存储更多信息,还有其他方法可以加快此过程。您可以使用以下规则存储最小值和最大值:

When adding a value to an empty list, set min and max to that value. Constant time O(1).

将值添加到空列表时,将min和max设置为该值。恒定时间O(1)。

When adding a value to a non-empty list, set min or max to that value if appropriate. Constant time O(1).

将值添加到非空列表时,如果适当,请将min或max设置为该值。恒定时间O(1)。

When deleting a value from the list, set min or max to 'unknown' if the value being deleted is equal to the current min or max. Constant time O(1). You can also make this more efficient if you store both the min/max and the counts of them. In other words, if your list has seven copies of the current maximum and you delete one, there's no need to set the maximum to unknown, just decrement the count. Only when the count reaches zero should you mark it unknown.

从列表中删除值时,如果要删除的值等于当前最小值或最大值,则将min或max设置为“unknown”。恒定时间O(1)。如果同时存储最小/最大值和计数,也可以提高效率。换句话说,如果您的列表有七个当前最大值的副本并且您删除了一个,则无需将最大值设置为未知,只需减少计数。只有当计数达到零时才应将其标记为未知。

If you ask for the minimum or maximum for an empty list, return some special value. Constant time O(1).

如果要求空列表的最小值或最大值,请返回一些特殊值。恒定时间O(1)。

If you ask for the minimum or maximum for a non-empty list where the values are known, return the relevant value. Constant time O(1).

如果要求已知值的非空列表的最小值或最大值,请返回相关值。恒定时间O(1)。

If you ask for the minimum or maximum for a non-empty list where the values are unknown, do a linear search to discover them then return the relevant value. Linear time O(n).

如果要求值为未知的非空列表的最小值或最大值,请执行线性搜索以发现它们,然后返回相关值。线性时间O(n)。

By doing it that way, probably the vast majority of retrieving min/max are constant time. It's only when you've removed a value which was the min or max does the next retrieval require linear time for one retrieval.

通过这样做,可能绝大多数检索最小/最大是恒定时间。只有在您删除了最小值或最大值时,下一次检索才需要线性时间进行一次检索。

The next retrieval after that will again be constant time since you've calculated and stored them, assuming you don't remove the min/max value in the interim again.

假设您没有再次删除过渡期间的最小值/最大值,那么在您计算并存储它们之后,下一次检索将再次成为恒定时间。

Pseudo-code for just the maximum could be as simple as:

只有最大值的伪代码可以很简单:

def initList ():
    list = []
    maxval = 0
    maxcount = 0

In that initialisation code above, we simply create the list and a maximum value and count. It would be easy to also add the minimum value and count as well.

在上面的初始化代码中,我们只需创建列表以及最大值和计数。也可以很容易地添加最小值和计数。

To add to the list, we follow the rules above:

要添加到列表中,我们遵循以上规则:

def addToList (val):
    list.add (val) error on failure

    # Detect adding to empty list.
    if list.size = 1:
        maxval = val
        maxcount = 1
        return

    # If no maximum known at this point, calc later.
    if maxcount = 0:
        return

    # Adding less than current max, ignore.
    if val < maxval:
        return

    # Adding another of current max, bump up count.
    if val = maxval:
        maxcount += 1
        return

    # Otherwise, new max, set value and count.
    maxval = val
    maxcount = 1

Deleting is quite simple. Just delete the value. If it was the maximum value, decrement the count of those maximum values. Note that this only makes sense if you know the current maximum - if not, you were already in the state where you were going to have to calculate it so just stay in that state.

删除很简单。只需删除该值即可。如果它是最大值,则减少这些最大值的计数。请注意,这只有在知道当前最大值时才有意义 - 如果不知道,那么您已经处于必须计算它的状态,因此只需保持该状态。

The count becoming zero will indicate the maximum is now unknown (you've deleted them all):

计数变为零将表示最大值现在未知(您已将其全部删除):

def delFromList (val):
    list.del (val) error on failure

    # Decrement count if max is known and the value is max.
    # The count will become 0 when all maxes deleted.
    if maxcount > 0 and val = maxval:
        maxcount -= 1

Getting the maximum is then a matter of knowing when it needs to be calculated (when maxcount is zero). If it doesn't need to be calculated, just return it:

获得最大值就是知道何时需要计算(当maxcount为零时)。如果不需要计算,只需返回它:

def getMax ():
    # raise exception if list empty.
    error if list.size = 0

    # If maximum unknown, calculate it on demand.
    if maxcount = 0:
        maxval = list[0]
        for each val in list:
            if val = maxval:
                maxcount += 1
            elsif val > maxval:
                maxval = val
                maxcount = 1

    # Now it is known, just return it.
    return maxval

All that pseudo-code uses seemingly global variables, list, maxval and maxcount. In a properly engineered system, they would of course be instance variables so that you can run multiple lists side-by-side.

所有伪代码都使用看似全局变量,list,maxval和maxcount。在正确设计的系统中,它们当然是实例变量,因此您可以并排运行多个列表。

#2

Given the generic question:

鉴于一般问题:

Can I find the max/min value in an unsorted Array in sub linear time?

我可以在子线性时间内找到未排序数组中的最大/最小值吗?

I can't imagine any mechanism that would make this happen.

我无法想象任何能够实现这一目标的机制。

However, if you keep a reference to the min and max value and update the values on every insert / append / replace operation, the amortized cost of min / max lookups can be very cheap.

但是,如果您保留对最小值和最大值的引用并更新每个插入/追加/替换操作的值,则最小/最大查找的摊销成本可能非常便宜。

Sorting the array is very expensive compared to a simple linear scan to find the min and max, so only sort if there is some other benefit. (Of course, insertion sort can provide very similar properties to updating the min and max values on every insert / append / replace operation, so it might be acceptable enough.)

与简单的线性扫描相比,对阵列进行排序非常昂贵,以找到最小值和最大值,因此只有在有其他好处时才进行排序。 (当然,插入排序可以提供非常类似的属性来更新每个插入/追加/替换操作的最小值和最大值,因此它可能是可接受的。)

#3

For unsorted array min/max complexity is O(N). No way to outperform it. For sorted arrays 0(1) but sort is 0{N log N). and if you need to search for min/max only ones or near it sort is not useful. But if you go this operation many times look at some of search structures such as Rb-tree or heap to reorganize date for avoid linear time in search.

对于未排序的数组,最小/最大复杂度为O(N)。没办法超越它。对于排序数组0(1)但排序为0 {N log N)。如果你需要搜索最小/最大只有一个或接近它,排序是没用的。但是,如果您多次执行此操作,请查看某些搜索结构(如Rb-tree或堆)以重新组织日期以避免搜索中的线性时间。

#4

in this complete answer (with C++ code) i found here - What is the best way to get the minimum or maximum value from an Array of numbers - com - it clearly show that the total number of comparisons is 3n/2 - 2 if n is even (and for odd the constant is 3/2 ) .

在这个完整的答案(使用C ++代码)我在这里找到 - 从数组数字中获得最小值或最大值的最佳方法是什么 - com - 它清楚地表明比较总数是3n / 2 - 2如果n是偶数(奇数常数是3/2)。

so after ignoring 2 constants ( qualifier of 3/2, and -2 ) which have no effect for enough large n , it obviously belongs to O(n) and it is linear in terms of complexity but in terms of efficiency (if i can say so) it is 1.5n and is very excellent

所以在忽略了对于足够大的n没有影响的2个常数(3/2和-2的限定符)之后,它显然属于O(n)并且在复杂性方面是线性的但在效率方面(如果我能这么说)它是1.5n并且非常优秀

#1