为什么我的快速排序崩溃与大的,反向排序的数组?

时间:2021-06-02 11:44:30

I'm learning C and I tried out a recursive quicksort algorithm. At small input sizes, it works as expected; with random generated arrays it had no problems with all tested sizes (up to 100,000). With an descending array, it somehow breaks (Windows gives me a message, that the program has stopped working) at a certain array size (32,506). Is there any error in my code (for example any wrong memory allocation - I'm not sure if I got this right) or does C have a limit in recursive calls or anything else?

我在学习C,我尝试了递归快速排序算法。在较小的输入尺寸下,它可以正常工作;在随机生成的数组中,它对所有测试的大小(高达10万)没有任何问题。对于下行数组,它以某种方式在一定的数组大小(32,506)下中断(Windows给我一条消息,程序已经停止工作)。在我的代码中是否存在错误(例如,错误的内存分配——我不确定是否正确),还是C在递归调用或其他方面有限制?

Edit: I know that my Quicksort implementation is rather naive and that it behaves terribly with this sort of Input, but I didn’t expect it to crash.

编辑:我知道我的快速排序实现是相当幼稚的,它对这种输入的表现非常糟糕,但是我没有想到它会崩溃。

I am using GCC with MinGW on the command prompt on Windows 10. I’m not sure how to find out what happens exactly because I’m not really getting any specified error message despite of Windows telling me that my program has stopped working.

我在Windows 10的命令提示上使用了GCC和MinGW。我不知道如何确切地找出发生了什么,因为我并没有得到任何指定的错误消息,尽管Windows告诉我我的程序已经停止工作。

#include <stdio.h>
#include <stdlib.h>

int partition(int *a, int lo, int hi) {
    int i = lo; int j = hi+1; int v,t;
    v = a[lo]; //partition element
    while (1) {
        while (a[++i] < v) {if (i == hi) break;}
        while (v < a[--j]) {if (j == lo) break;}
        if (i >= j) break;
        t = a[j]; a[j] = a[i]; a[i]= t; //swap
    }
    t = a[lo]; a[lo] = a[j]; a[j]= t;//swap
    return j;
}

void quicksort(int a[], int lo, int hi) {
    int j;
    if (hi <= lo) return;
    j = partition(a, lo, hi);
    quicksort(a, lo, j-1);
    quicksort(a, j+1, hi);
}

int main()  {
    int len;
    for (len = 32000;len < 40000;len+=100) {
        printf("New Arr with len = %d\n",len);
        int *arr;
        arr = (int*) calloc(len,sizeof(int));
        int j;
        //create descending Array
        for (j = 0; j < len; ++j) {
            arr[j] = len-j;
        }
        printf("start sorting\n");
        quicksort(arr,0,len-1);
        free(arr);
    }
}

2 个解决方案

#1


2  

For me, your code fails at much larger sizes (c. 370,000 elements). You are likely running into a platform limit (probably limits to recursion depth due to stack overflow). Without the exact error message, it's hard to be sure, of course.

对我来说,您的代码在更大的尺寸(37万元素)下会失败。您可能会遇到一个平台限制(可能由于堆栈溢出而限制递归深度)。没有准确的错误信息,当然很难确定。

Your input set is likely a pathological case for your implementation - see What makes for a bad case for quick sort?

您的输入集很可能是您实现的一个病理案例——看看是什么导致了快速排序的糟糕情况?

You can reduce the recursion depth by a better choice of pivot - a common technique is to take the median of the first, central and last elements. Something like this:

您可以通过更好地选择轴心来减少递归深度——一种常见的技术是取第一个、中心和最后一个元素的中间值。是这样的:

int v0 = a[lo], v1 = a[(lo+hi+1)/2], v2 = a[hi];
/* pivot: median of v0,v1,v2 */
int v = v0 < v1 ? v1 < v2 ? v1 : v0 < v2 ? v2 : v0 : v0 < v2 ? v0 : v1 < v2 ? v2 : v1;

You can also reduce the recursion depth by recursing only for the smaller of the partitions, and using iteration to process the larger one. You may be able to get your compiler's tail-call eliminator to convert the recursion to iteration, but if that doesn't work, you'll need to write it yourself. Something like:

您还可以通过只对较小的分区递归和使用迭代处理较大的分区来减少递归深度。您可能会得到编译器的尾部调用消除器来将递归转换为迭代,但是如果它不起作用,您需要自己编写它。喜欢的东西:

void quicksort(int a[], int lo, int hi) {
    while (lo < hi) {
        int j = partition(a, lo, hi);
        if (j - lo < hi -j) {
            quicksort(a, lo, j-1);
            lo = j+1;
        } else {
            quicksort(a, j+1, hi);
            hi = j-1;
        }
    }
}

With the above changes, I can sort arrays of over a billion elements without crashing (I had to make some performance improvements - see below - and even then, it took 17 seconds).

通过上面的修改,我可以在不崩溃的情况下对超过10亿个元素的数组进行排序(我必须做一些性能改进——请参见下面——即使这样,也需要17秒)。

You may also want to return early when you find a sub-array is already sorted. I'll leave that as an exercise.

当发现一个子数组已经被排序时,您可能还想早点返回。我把它留作练习。


P.S. A couple of issues in your main():

在你的main()中有几个问题:

You don't test the result of calloc() - and you probably should be using malloc() instead, as you will write every element anyway:

您不需要测试calloc()的结果——您可能应该使用malloc(),因为无论如何您都要编写每个元素:

int *arr = malloc(len * sizeof *arr);
if (!arr) return fprintf(stderr, "allocation failed\n"), EXIT_FAILURE;

Full listing

Here's the code I ended up with:

下面是我最后的代码:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int partition(int *a, int i, int j) {
    int v0 = a[i], v1 = a[(i+j+1)/2], v2 = a[j];
    /* pivot: median of v0,v1,v2 */
    int v = v0 < v1 ? v1 < v2 ? v1 : v0 < v2 ? v2 : v0 : v0 < v2 ? v0 : v1 < v2 ? v2 : v1;
    while (i < j) {
        while (a[i] < v && ++i < j)
            ;
        while (v < a[j] && i < --j)
            ;
        int t = a[j]; a[j] = a[i]; a[i]= t; //swap
    }
    /* i == j; that's where the pivot belongs */
    a[i] = v;
    return j;
}

void quicksort(int a[], int lo, int hi) {
    while (lo < hi) {
        int j = partition(a, lo, hi);
        if (j - lo < hi -j) {
            quicksort(a, lo, j-1);
            lo = j+1;
        } else {
            quicksort(a, j+1, hi);
            hi = j-1;
        }
    }
}

int main()  {
    int len = INT_MAX/2+1;
    printf("New Arr with len = %d\n",len);
    int *arr = malloc(len * sizeof *arr);
    if (!arr) return fprintf(stderr, "allocation failed\n"), EXIT_FAILURE;

    /* populate pessimal array */
    for (int j = 0; j < len; ++j) {
        arr[j] = len-j;
    }

    printf("start sorting\n");
    quicksort(arr, 0, len-1);

    /* test - is it sorted? */
    for (int i = 0;  i+1 < len;  ++i)
        if (arr[i] >= arr[i+1])
            return fprintf(stderr, "not sorted\n"), EXIT_FAILURE;
    free(arr);
}

#2


0  

Recursion is too deep to store it on stack. It has to store int j = partition(..) for each level. There are declarative techniques to minimize recursive stack usage. For example carrying the results as argument. But this case is far more complicated than I could give an example.

递归太深,无法将其存储在堆栈中。它必须为每个级别存储int j = partition(.. .)。有一些声明性技术可以最小化递归堆栈的使用。例如,将结果作为参数。但这个例子比我举的例子要复杂得多。

#1


2  

For me, your code fails at much larger sizes (c. 370,000 elements). You are likely running into a platform limit (probably limits to recursion depth due to stack overflow). Without the exact error message, it's hard to be sure, of course.

对我来说,您的代码在更大的尺寸(37万元素)下会失败。您可能会遇到一个平台限制(可能由于堆栈溢出而限制递归深度)。没有准确的错误信息,当然很难确定。

Your input set is likely a pathological case for your implementation - see What makes for a bad case for quick sort?

您的输入集很可能是您实现的一个病理案例——看看是什么导致了快速排序的糟糕情况?

You can reduce the recursion depth by a better choice of pivot - a common technique is to take the median of the first, central and last elements. Something like this:

您可以通过更好地选择轴心来减少递归深度——一种常见的技术是取第一个、中心和最后一个元素的中间值。是这样的:

int v0 = a[lo], v1 = a[(lo+hi+1)/2], v2 = a[hi];
/* pivot: median of v0,v1,v2 */
int v = v0 < v1 ? v1 < v2 ? v1 : v0 < v2 ? v2 : v0 : v0 < v2 ? v0 : v1 < v2 ? v2 : v1;

You can also reduce the recursion depth by recursing only for the smaller of the partitions, and using iteration to process the larger one. You may be able to get your compiler's tail-call eliminator to convert the recursion to iteration, but if that doesn't work, you'll need to write it yourself. Something like:

您还可以通过只对较小的分区递归和使用迭代处理较大的分区来减少递归深度。您可能会得到编译器的尾部调用消除器来将递归转换为迭代,但是如果它不起作用,您需要自己编写它。喜欢的东西:

void quicksort(int a[], int lo, int hi) {
    while (lo < hi) {
        int j = partition(a, lo, hi);
        if (j - lo < hi -j) {
            quicksort(a, lo, j-1);
            lo = j+1;
        } else {
            quicksort(a, j+1, hi);
            hi = j-1;
        }
    }
}

With the above changes, I can sort arrays of over a billion elements without crashing (I had to make some performance improvements - see below - and even then, it took 17 seconds).

通过上面的修改,我可以在不崩溃的情况下对超过10亿个元素的数组进行排序(我必须做一些性能改进——请参见下面——即使这样,也需要17秒)。

You may also want to return early when you find a sub-array is already sorted. I'll leave that as an exercise.

当发现一个子数组已经被排序时,您可能还想早点返回。我把它留作练习。


P.S. A couple of issues in your main():

在你的main()中有几个问题:

You don't test the result of calloc() - and you probably should be using malloc() instead, as you will write every element anyway:

您不需要测试calloc()的结果——您可能应该使用malloc(),因为无论如何您都要编写每个元素:

int *arr = malloc(len * sizeof *arr);
if (!arr) return fprintf(stderr, "allocation failed\n"), EXIT_FAILURE;

Full listing

Here's the code I ended up with:

下面是我最后的代码:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int partition(int *a, int i, int j) {
    int v0 = a[i], v1 = a[(i+j+1)/2], v2 = a[j];
    /* pivot: median of v0,v1,v2 */
    int v = v0 < v1 ? v1 < v2 ? v1 : v0 < v2 ? v2 : v0 : v0 < v2 ? v0 : v1 < v2 ? v2 : v1;
    while (i < j) {
        while (a[i] < v && ++i < j)
            ;
        while (v < a[j] && i < --j)
            ;
        int t = a[j]; a[j] = a[i]; a[i]= t; //swap
    }
    /* i == j; that's where the pivot belongs */
    a[i] = v;
    return j;
}

void quicksort(int a[], int lo, int hi) {
    while (lo < hi) {
        int j = partition(a, lo, hi);
        if (j - lo < hi -j) {
            quicksort(a, lo, j-1);
            lo = j+1;
        } else {
            quicksort(a, j+1, hi);
            hi = j-1;
        }
    }
}

int main()  {
    int len = INT_MAX/2+1;
    printf("New Arr with len = %d\n",len);
    int *arr = malloc(len * sizeof *arr);
    if (!arr) return fprintf(stderr, "allocation failed\n"), EXIT_FAILURE;

    /* populate pessimal array */
    for (int j = 0; j < len; ++j) {
        arr[j] = len-j;
    }

    printf("start sorting\n");
    quicksort(arr, 0, len-1);

    /* test - is it sorted? */
    for (int i = 0;  i+1 < len;  ++i)
        if (arr[i] >= arr[i+1])
            return fprintf(stderr, "not sorted\n"), EXIT_FAILURE;
    free(arr);
}

#2


0  

Recursion is too deep to store it on stack. It has to store int j = partition(..) for each level. There are declarative techniques to minimize recursive stack usage. For example carrying the results as argument. But this case is far more complicated than I could give an example.

递归太深,无法将其存储在堆栈中。它必须为每个级别存储int j = partition(.. .)。有一些声明性技术可以最小化递归堆栈的使用。例如,将结果作为参数。但这个例子比我举的例子要复杂得多。