最糟糕的Quicksort算法案例

时间:2021-06-29 07:44:26

I found many implementations of quick sort algorithm, but at the end I decided to stick to this one:

我发现了许多快速排序算法的实现,但最后我决定坚持这个:

public static void quickSort(int array[], int start, int end)
        {
            if(end <= start || start >= end) { 

            } else {
            int pivot = array[start];
            int temp = 0 ;
            int i = start+1;

            for(int j = 1; j <= end; j++)  { 
                if(pivot > array[j]) { 
                    temp = array[j];
                    array[j] = array[i];
                    array[i] = temp;
                    i++;
                }

            }
            array[start] = array[i-1];
            array[i-1] = pivot;
            quickSort(array, start, i-2);
            quickSort(array, i, end);
        }} 

There are several things I'm confused about.
Why some people suggest taking the first element as a pivot point, others tell to pick the middle element and some will tell that you should pick the last element as your pivot point, wouldn't it be different?
Let's say I'm trying to show why if the array is sorted quick sort will have O(n^2) as the worst case order of growth.
I have the following array:
{1, 2, 3, 4, 5, 6}.
If I pick the first element as my pivot element, would it not compare it to every other element and then will just swap it with itself and will be just O(n)? Then it will proceed further to two lines which are O(logn)

有几件我很困惑的事情。为什么有些人建议把第一个元素作为一个支点,其他人告诉你选择中间元素,有些人会告诉你应该选择最后一个元素作为你的支点,它不会有所不同吗?假设我试图说明为什么如果数组被排序,快速排序将有O(n ^ 2)作为最坏情况的增长顺序。我有以下数组:{1,2,3,4,5,6}。如果我选择第一个元素作为我的枢轴元素,它是否会将它与其他所有元素进行比较,然后只是将它与自身交换并且只是O(n)?然后它将继续进行两行,即O(logn)

quickSort(array, start, i-2);
quickSort(array, i, end);

So at the end, even if it is an ordered list of integers, it will still be O(nlogn)?

所以最后,即使它是一个有序的整数列表,它仍然是O(nlogn)?

If I decided to pick my last element as my pivot element, would it not be completely different? It will be swapping 6 and 1 and hence it will be doing the operations that are completely different compared to when the pivot element was the first element in the array.

如果我决定选择我的最后一个元素作为我的枢轴元素,它会不会完全不同?它将交换6和1,因此与枢轴元素是数组中的第一个元素相比,它将执行完全不同的操作。

I just don't understand why the worst case is O(n^2).

我只是不明白为什么最坏的情况是O(n ^ 2)。

Any help will be greatly appreciated!

任何帮助将不胜感激!

3 个解决方案

#1


6  

The whole point of Quicksort is to find a pivot that partitions the array into two approximately equal pieces. That's where you get the log(n) from.

Quicksort的重点是找到一个将数组分成两个大致相等的部分。这就是你得到log(n)的地方。

Suppose there is an array of size n and at each iteration you can partition the array into equal parts. Then we have:

假设有一个大小为n的数组,并且在每次迭代时,您可以将数组分成相等的部分。然后我们有:

T(n) = 2 * T(n / 2) + O(n)
     = 4 * T(n/4) + 2 * O(n)
.
.
(log(n) steps)
.
.
    = 2^log(n) * T(1) + log(n) * O(n)
    = n * O(1) + O(n * log(n))
    = O(n * log(n))

Now, if we partition the array into sizes say 1 and n-1, we get:

现在,如果我们将数组分区为1和n-1的大小,我们得到:

T(n) = T(1) + T(n-1) + O(n) = T(n-1) + O(n)
     = T(n-2) + O(n-1) + O(n)
     = T(n-3) + O(n-2) + O(n-1) + O(n)
.
.
(n-1) steps
.
.
    = T(1) + O(2) + O(3) + ... + O(n)
    = O(1 + 2 + 3 + .... + n)
    = O(n^2)

In the case that you mention, both of the following will not individually be O(log(n)). One will be O(1) and the other will be T(n-1) if the array is sorted. Hence you would get the O(n^2) complexity.

在您提到的情况下,以下两个都不会单独为O(log(n))。如果数组已排序,则一个将是O(1),另一个将是T(n-1)。因此,您将获得O(n ^ 2)复杂度。

quickSort(array, start, i-2); // should be constant time
quickSort(array, i, end); // should be T(n-1)

And as @MarkRansom mentions below, this is not exclusive to sorted arrays. In general, if you choose pivots in such a way that the array is very unevenly partitioned, you'll run into such worst-case complexities. For example, if the array is not sorted but you always choose the maximum (or minimum depending upon your implementation) for the pivot, you'll run into the same problem.

正如@MarkRansom在下面提到的,这不是排序数组所独有的。一般来说,如果你选择的方式使得数组的分区非常不均匀,那么你将遇到这种最坏情况的复杂性。例如,如果数组未排序,但您始终为枢轴选择最大值(或最小值,具体取决于您的实现),则会遇到同样的问题。

#2


3  

QuickSort starts by moving everything that's got a higher value than the pivot value to the end of the list, and everything that's got a lower value to the beginning of the list.

QuickSort首先将具有比枢轴值更高值的所有内容移动到列表末尾,并将所有值都降低到列表开头的值。

If the value at your pivot point is the lowest value in the list, then every value in the list will be moved to the end of the list. However, just determining where to move all of those values requires O(n) work. If you then pick the second-lowest value, and then the third-lowest value, etc., then you'll end up doing O(n) work n/2 times. O(n²/2) simplifies to O(n²).

如果轴心点的值是列表中的最小值,则列表中的每个值都将移动到列表的末尾。但是,仅确定移动所有这些值的位置需要O(n)工作。如果你然后选择第二低的值,然后选择第三低的值等,那么你最终将做O(n)工作n / 2次。 O(n²/ 2)简化为O(n²)。

Why some people suggest taking the first element as a pivot point, others tell to pick the middle element and some will tell that you should pick the last element as your pivot point, wouldn't it be different?

为什么有些人建议把第一个元素作为一个支点,其他人告诉你选择中间元素,有些人会告诉你应该选择最后一个元素作为你的支点,它不会有所不同吗?

It's all a matter of trying to guess (without scanning the whole list) which element is most likely to be close to the median of your data set, thereby giving you as close to the best-case behavior as possible.

这一切都是为了猜测(不扫描整个列表)哪个元素最有可能接近数据集的中位数,从而使您尽可能接近最佳案例行为。

  • If your data is totally random, then it doesn't matter what you choose--you're equally likely to get a good pivot point, and your chances of consistently choosing the worst pivot point are very slim. Choosing the first or last value is the simplest option that works.
  • 如果您的数据完全是随机的,那么您选择的内容并不重要 - 您同样可能获得良好的支点,并且您始终选择最差支点的机会非常渺茫。选择第一个或最后一个值是最简单的选项。
  • If your data is presorted (or mostly so), choosing the middle is probably going to get you one of the best values, whereas choosing the first or last element will consistently give you the worst pivot points.
  • 如果您的数据是预先排序的(或大部分是这样),选择中间可能会让您获得最佳值之一,而选择第一个或最后一个元素将始终为您提供最差的支点。

In real life, the likelihood of dealing with data that's mostly presorted is high enough that it's probably worth the slightly higher complexity of code. The Wikipedia section on this topic may be worth reading.

在现实生活中,处理大多数预先排序的数据的可能性足够高,以至于可能值得稍高的代码复杂性。有关此主题的*部分可能值得一读。

#3


3  

Below is a quicksort that uses median of 3, a variation of Hoare partition that excludes middle elements equal to pivot (they're already sorted), and limits stack complexity to O(log(n)) by only using recursion on the smaller part, then looping back for the larger part. Worst case time complexity is still O(n^2), but this would require median of 3 to repeatedly choose small or large values. Best case O(n) occurs when all values are the same (due to excluding middle values equal to pivot). Time complexity can be limited to O(n log(n)) by using median of medians, but the overhead for this makes the average case much slower (I'm wondering if it ends up slower than heap sort. With median of medians, it's definitely slower than merge sort, but merge sort needs a second array the same size or 1/2 the size of the original array).

下面是一个快速排序,使用3的中间值,Hoare分区的变体,排除等于pivot的中间元素(它们已经排序),并通过仅使用较小部分的递归将堆栈复杂性限制为O(log(n)) ,然后循环回到更大的部分。最坏情况时间复杂度仍为O(n ^ 2),但这需要中位数为3来重复选择小值或大值。当所有值相同时(由于排除等于pivot的中间值),出现最佳情况O(n)。通过使用中位数的中位数,时间复杂度可以限制为O(n log(n)),但是这样的开销使得平均情况要慢得多(我想知道它是否比堆排序更慢。中位数为中位数,它肯定比合并排序慢,但合并排序需要第二个数组大小相同或原始数组大小的1/2。

http://en.wikipedia.org/wiki/Median_of_medians

http://en.wikipedia.org/wiki/Median_of_medians

Introsort solves the worst case time complexity by switching to heap sort based on the level of recursion.

Introsort通过基于递归级别切换到堆排序来解决最坏情况时间复杂性。

http://en.wikipedia.org/wiki/Introsort

http://en.wikipedia.org/wiki/Introsort

typedef unsigned int uint32_t;

void QuickSort(uint32_t a[], size_t lo, size_t hi) {
    while(lo < hi){
        size_t i = lo, j = (lo+hi)/2, k = hi;
        uint32_t p;
        if (a[k] < a[i])            // median of 3
            std::swap(a[k], a[i]);
        if (a[j] < a[i])
            std::swap(a[j], a[i]);
        if (a[k] < a[j])
            std::swap(a[k], a[j]);
        p = a[j];
        i--;                        // Hoare partition
        k++;
        while (1) {
            while (a[++i] < p);
            while (a[--k] > p);
            if (i >= k)
                break;
            std::swap(a[i], a[k]);
        }
        i = k++;
        while(i > lo && a[i] == p)  // exclude middle values == pivot
            i--;
        while(k < hi && a[k] == p)
            k++;
        // recurse on smaller part, loop on larger part
        if((i - lo) <= (hi - k)){
            QuickSort(a, lo, i);
            lo = k;
        } else {
            QuickSort(a, k, hi);
            hi = i;
        }
    }
}

#1


6  

The whole point of Quicksort is to find a pivot that partitions the array into two approximately equal pieces. That's where you get the log(n) from.

Quicksort的重点是找到一个将数组分成两个大致相等的部分。这就是你得到log(n)的地方。

Suppose there is an array of size n and at each iteration you can partition the array into equal parts. Then we have:

假设有一个大小为n的数组,并且在每次迭代时,您可以将数组分成相等的部分。然后我们有:

T(n) = 2 * T(n / 2) + O(n)
     = 4 * T(n/4) + 2 * O(n)
.
.
(log(n) steps)
.
.
    = 2^log(n) * T(1) + log(n) * O(n)
    = n * O(1) + O(n * log(n))
    = O(n * log(n))

Now, if we partition the array into sizes say 1 and n-1, we get:

现在,如果我们将数组分区为1和n-1的大小,我们得到:

T(n) = T(1) + T(n-1) + O(n) = T(n-1) + O(n)
     = T(n-2) + O(n-1) + O(n)
     = T(n-3) + O(n-2) + O(n-1) + O(n)
.
.
(n-1) steps
.
.
    = T(1) + O(2) + O(3) + ... + O(n)
    = O(1 + 2 + 3 + .... + n)
    = O(n^2)

In the case that you mention, both of the following will not individually be O(log(n)). One will be O(1) and the other will be T(n-1) if the array is sorted. Hence you would get the O(n^2) complexity.

在您提到的情况下,以下两个都不会单独为O(log(n))。如果数组已排序,则一个将是O(1),另一个将是T(n-1)。因此,您将获得O(n ^ 2)复杂度。

quickSort(array, start, i-2); // should be constant time
quickSort(array, i, end); // should be T(n-1)

And as @MarkRansom mentions below, this is not exclusive to sorted arrays. In general, if you choose pivots in such a way that the array is very unevenly partitioned, you'll run into such worst-case complexities. For example, if the array is not sorted but you always choose the maximum (or minimum depending upon your implementation) for the pivot, you'll run into the same problem.

正如@MarkRansom在下面提到的,这不是排序数组所独有的。一般来说,如果你选择的方式使得数组的分区非常不均匀,那么你将遇到这种最坏情况的复杂性。例如,如果数组未排序,但您始终为枢轴选择最大值(或最小值,具体取决于您的实现),则会遇到同样的问题。

#2


3  

QuickSort starts by moving everything that's got a higher value than the pivot value to the end of the list, and everything that's got a lower value to the beginning of the list.

QuickSort首先将具有比枢轴值更高值的所有内容移动到列表末尾,并将所有值都降低到列表开头的值。

If the value at your pivot point is the lowest value in the list, then every value in the list will be moved to the end of the list. However, just determining where to move all of those values requires O(n) work. If you then pick the second-lowest value, and then the third-lowest value, etc., then you'll end up doing O(n) work n/2 times. O(n²/2) simplifies to O(n²).

如果轴心点的值是列表中的最小值,则列表中的每个值都将移动到列表的末尾。但是,仅确定移动所有这些值的位置需要O(n)工作。如果你然后选择第二低的值,然后选择第三低的值等,那么你最终将做O(n)工作n / 2次。 O(n²/ 2)简化为O(n²)。

Why some people suggest taking the first element as a pivot point, others tell to pick the middle element and some will tell that you should pick the last element as your pivot point, wouldn't it be different?

为什么有些人建议把第一个元素作为一个支点,其他人告诉你选择中间元素,有些人会告诉你应该选择最后一个元素作为你的支点,它不会有所不同吗?

It's all a matter of trying to guess (without scanning the whole list) which element is most likely to be close to the median of your data set, thereby giving you as close to the best-case behavior as possible.

这一切都是为了猜测(不扫描整个列表)哪个元素最有可能接近数据集的中位数,从而使您尽可能接近最佳案例行为。

  • If your data is totally random, then it doesn't matter what you choose--you're equally likely to get a good pivot point, and your chances of consistently choosing the worst pivot point are very slim. Choosing the first or last value is the simplest option that works.
  • 如果您的数据完全是随机的,那么您选择的内容并不重要 - 您同样可能获得良好的支点,并且您始终选择最差支点的机会非常渺茫。选择第一个或最后一个值是最简单的选项。
  • If your data is presorted (or mostly so), choosing the middle is probably going to get you one of the best values, whereas choosing the first or last element will consistently give you the worst pivot points.
  • 如果您的数据是预先排序的(或大部分是这样),选择中间可能会让您获得最佳值之一,而选择第一个或最后一个元素将始终为您提供最差的支点。

In real life, the likelihood of dealing with data that's mostly presorted is high enough that it's probably worth the slightly higher complexity of code. The Wikipedia section on this topic may be worth reading.

在现实生活中,处理大多数预先排序的数据的可能性足够高,以至于可能值得稍高的代码复杂性。有关此主题的*部分可能值得一读。

#3


3  

Below is a quicksort that uses median of 3, a variation of Hoare partition that excludes middle elements equal to pivot (they're already sorted), and limits stack complexity to O(log(n)) by only using recursion on the smaller part, then looping back for the larger part. Worst case time complexity is still O(n^2), but this would require median of 3 to repeatedly choose small or large values. Best case O(n) occurs when all values are the same (due to excluding middle values equal to pivot). Time complexity can be limited to O(n log(n)) by using median of medians, but the overhead for this makes the average case much slower (I'm wondering if it ends up slower than heap sort. With median of medians, it's definitely slower than merge sort, but merge sort needs a second array the same size or 1/2 the size of the original array).

下面是一个快速排序,使用3的中间值,Hoare分区的变体,排除等于pivot的中间元素(它们已经排序),并通过仅使用较小部分的递归将堆栈复杂性限制为O(log(n)) ,然后循环回到更大的部分。最坏情况时间复杂度仍为O(n ^ 2),但这需要中位数为3来重复选择小值或大值。当所有值相同时(由于排除等于pivot的中间值),出现最佳情况O(n)。通过使用中位数的中位数,时间复杂度可以限制为O(n log(n)),但是这样的开销使得平均情况要慢得多(我想知道它是否比堆排序更慢。中位数为中位数,它肯定比合并排序慢,但合并排序需要第二个数组大小相同或原始数组大小的1/2。

http://en.wikipedia.org/wiki/Median_of_medians

http://en.wikipedia.org/wiki/Median_of_medians

Introsort solves the worst case time complexity by switching to heap sort based on the level of recursion.

Introsort通过基于递归级别切换到堆排序来解决最坏情况时间复杂性。

http://en.wikipedia.org/wiki/Introsort

http://en.wikipedia.org/wiki/Introsort

typedef unsigned int uint32_t;

void QuickSort(uint32_t a[], size_t lo, size_t hi) {
    while(lo < hi){
        size_t i = lo, j = (lo+hi)/2, k = hi;
        uint32_t p;
        if (a[k] < a[i])            // median of 3
            std::swap(a[k], a[i]);
        if (a[j] < a[i])
            std::swap(a[j], a[i]);
        if (a[k] < a[j])
            std::swap(a[k], a[j]);
        p = a[j];
        i--;                        // Hoare partition
        k++;
        while (1) {
            while (a[++i] < p);
            while (a[--k] > p);
            if (i >= k)
                break;
            std::swap(a[i], a[k]);
        }
        i = k++;
        while(i > lo && a[i] == p)  // exclude middle values == pivot
            i--;
        while(k < hi && a[k] == p)
            k++;
        // recurse on smaller part, loop on larger part
        if((i - lo) <= (hi - k)){
            QuickSort(a, lo, i);
            lo = k;
        } else {
            QuickSort(a, k, hi);
            hi = i;
        }
    }
}