在c中排序和删除int数组中的重复项

时间:2023-01-13 19:29:25

I am learning C and came over the topic of sorting. I wrote a comp() function in and used qsort to sort an array of int. Now for the next task I need to remove the duplicates from the array.
Is it possible to sort and remove duplicates at the same time?

我正在学习C并且讨论了排序问题。我写了一个comp()函数并使用qsort对int数组进行排序。现在,对于下一个任务,我需要从数组中删除重复项。是否可以同时排序和删除重复项?

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>    
int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

int comp(const void * elem1, const void * elem2) {

    int f = *((int*) elem1);
    int s = *((int*) elem2);

    if (f > s) {    
        return 1;
    }    
    if (f < s) {    
        return -1;
    }    
    return 0;
}

void printIndexArray() {    
    int i = 0;    
    for (i = 0; i < 10; i++) {    
        printf("i is %d\n", indexes[i]);    
    }
}

int main() {    
    qsort(indexes, sizeof(indexes) / sizeof(int), sizeof(int), comp);    
    printIndexArray();    
    return 0;
}

5 个解决方案

#1


2  

Since your numbers are already sorted, removing dupes is easy. In C++, it's even built in as std::unique:

由于您的号码已经排序,因此删除欺骗很容易。在C ++中,它甚至以std :: unique的形式构建:

http://en.cppreference.com/w/cpp/algorithm/unique

Assuming you want to do it yourself, you can do it the same way unique does it:

假设你想自己做,你可以用与它相同的方式做到:

int* unique (int* first, int* last)
{
  if (first==last) return last;

  int* result = first;
  while (++first != last)
  {
    if (!(*result == *first)) 
      *(++result)=*first;
  }
  return ++result;
}

#2


1  

Yes

This can be achieved by mergesort. If both left and right are the same just merge the one value

这可以通过mergesort来实现。如果左右两者相同,则只合并一个值

#3


1  

That's the code that removes the duplicates using mergesort. This snippet of code does the removing work:

这是使用mergesort删除重复项的代码。这段代码执行删除工作:

else if(a[p1] == a[p2])
{
    merged[p] = a[p1];
    p1++;
    p2++;
}

That's the iterative merge sort while the recursive version would be easier.

这是迭代合并排序,而递归版本会更容易。

#include <stdio.h>
#include <stdlib.h>

#define min(a,b) (((a) < (b)) ? (a) : (b))

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

void merge(int *a, int s, int m, int e)
{
    int p1 = s;
    int p2 = m + 1;
    int * merged = (int*)malloc(sizeof(int) * (e - s + 1));
    int p = 0;
    while(p1 < m + 1 && p2 < e + 1)
    {
        if(a[p1] > a[p2])
        {
            merged[p] = a[p2];
            p2++;
        }
        else if(a[p1] == a[p2])
        {
            merged[p] = a[p1];
            p1++;
            p2++;
        }
        else
        {
            merged[p] = a[p1];
            p1++;
        }
        p++;
    }

    while(p1 < m + 1)
    {
        merged[p++] = a[p1++];
    }

    while(p2 < e + 1)
        merged[p++] = a[p2++];

    int i;
    for(i = 0;i < (e -s+1); i++)
    {
        a[s + i] = merged[i];
    }

    free(merged);
}

void merge_sort(int *a, int n)
{
    int width;
    for(width = 1; width < n; width = 2 * width)
    {
        int i;
        for(i = 0; i < n; i = i + 2 * width)
        {
            merge(a, i, min(i + width - 1, n - 1), min(i + 2 * width - 1, n - 1) );
        }
    }
}

void printIndexArray()
{    
    int i = 0;    
    for(i = 0; i < 10; i++)
    {    
        printf("i is %d\n", indexes[i]);    
    }
}

int main()
{
    merge_sort(indexes, sizeof(indexes) / sizeof(int) );
    printIndexArray();
    return 0;
}

#4


0  

The short answer is: yes.

简短的回答是:是的。

The long answer is: it is always possible, but the complexity to do it depends heavily on the algorithm you use.

答案很长:它始终是可能的,但这样做的复杂性在很大程度上取决于您使用的算法。

The more complex algorithms like quick-sort, slow-sort, bucket-sort, and straight-radix-sort do not lend themselves to such an enhancement, because they rely on the data being in a consecutive array, that can implicitly be split into subarrays. So, when you detect a duplicate, you cannot easily take it out. Again, it is possible, but certainly not a problem for beginners.

更复杂的算法,如快速排序,慢速排序,桶排序和直接基数排序,不适合这种增强,因为它们依赖于连续数组中的数据,可以隐含地分成子阵。因此,当您检测到重复时,您无法轻易将其取出。同样,这是可能的,但对初学者来说肯定不是问题。

The less complex in-place algorithms like bubble-sort, insertion-sort, and shell-sort make it relatively easy: you can just replace one of the duplicates you detect with a sentinel value that sorts greater than all legal values, and let it rise to the top. After that, you just need to scoop off the cream of sentinel values and you are done.

泡沫排序,插入排序和shell排序等不太复杂的就地算法使它变得相对简单:您可以用一个比所有合法值排序更大的标记值替换您检测到的重复项之一,并让它升到顶峰。在那之后,你只需要舀掉哨兵价值的精华,你就完成了。

The algorithms that really lend themselves to removing duplicates, are the ones that use intermediate arrays that grow/shrink in the process; in these cases you can just shrink or skip growing one of these intermediate arrays when you detect a duplicate. Candidates are merge-sort and heap-sort.

真正有助于删除重复项的算法是使用在过程中增长/缩小的中间数组的算法;在这些情况下,当您检测到重复时,您可以缩小或跳过增长其中一个中间数组。候选人是合并排序和堆排序。

Note, however, that it is more prudent to just sort the array, and eliminate duplicates in a second, separate step. Why? Because eliminating duplicates adds complexity to the inner loop of the sorting algorithm, which is of O(n*log(n)) in most relevant cases. But eliminating duplicates from a sorted array is an O(n) operation, making the split operation faster than the fused one.

但请注意,仅对数组进行排序更为谨慎,并在第二个单独的步骤中消除重复项。为什么?因为消除重复会增加排序算法的内部循环的复杂性,在大多数相关情况下,这是O(n * log(n))。但是从排序数组中消除重复是O(n)操作,使得拆分操作比融合操作更快。

#5


0  

#include <stdio.h>
#include <stdlib.h>

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

size_t undup(int array[], size_t len)
{
size_t src,dst;

if (!len) return 0;
for (src=dst=1; src < len; src++) {
        if (array[dst-1] == array[src]) continue;
        array[dst++] = array[src];
        }
return dst;
}

int comp(const void * elem1, const void * elem2) {

    int f = *((int*) elem1);
    int s = *((int*) elem2);

    if (f > s)     return 1;
    if (f < s)     return -1;

    return 0;
}

void printIndexArray(size_t len) {
    size_t i = 0;
    for (i = 0; i < len; i++) {
        printf("array[%zu] is %d\n", i, indexes[i]);
    }
}

int main() {
    size_t len = 10;
    printf("Before sort\n" );
    printIndexArray(len);

    qsort(indexes, sizeof indexes / sizeof indexes[0], sizeof indexes[0], comp);
    printf("After sort\n" );
    printIndexArray(len);

    len = undup(indexes,10);
    printf("After undup\n" );
    printIndexArray(len);

    return 0;
}

#1


2  

Since your numbers are already sorted, removing dupes is easy. In C++, it's even built in as std::unique:

由于您的号码已经排序,因此删除欺骗很容易。在C ++中,它甚至以std :: unique的形式构建:

http://en.cppreference.com/w/cpp/algorithm/unique

Assuming you want to do it yourself, you can do it the same way unique does it:

假设你想自己做,你可以用与它相同的方式做到:

int* unique (int* first, int* last)
{
  if (first==last) return last;

  int* result = first;
  while (++first != last)
  {
    if (!(*result == *first)) 
      *(++result)=*first;
  }
  return ++result;
}

#2


1  

Yes

This can be achieved by mergesort. If both left and right are the same just merge the one value

这可以通过mergesort来实现。如果左右两者相同,则只合并一个值

#3


1  

That's the code that removes the duplicates using mergesort. This snippet of code does the removing work:

这是使用mergesort删除重复项的代码。这段代码执行删除工作:

else if(a[p1] == a[p2])
{
    merged[p] = a[p1];
    p1++;
    p2++;
}

That's the iterative merge sort while the recursive version would be easier.

这是迭代合并排序,而递归版本会更容易。

#include <stdio.h>
#include <stdlib.h>

#define min(a,b) (((a) < (b)) ? (a) : (b))

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

void merge(int *a, int s, int m, int e)
{
    int p1 = s;
    int p2 = m + 1;
    int * merged = (int*)malloc(sizeof(int) * (e - s + 1));
    int p = 0;
    while(p1 < m + 1 && p2 < e + 1)
    {
        if(a[p1] > a[p2])
        {
            merged[p] = a[p2];
            p2++;
        }
        else if(a[p1] == a[p2])
        {
            merged[p] = a[p1];
            p1++;
            p2++;
        }
        else
        {
            merged[p] = a[p1];
            p1++;
        }
        p++;
    }

    while(p1 < m + 1)
    {
        merged[p++] = a[p1++];
    }

    while(p2 < e + 1)
        merged[p++] = a[p2++];

    int i;
    for(i = 0;i < (e -s+1); i++)
    {
        a[s + i] = merged[i];
    }

    free(merged);
}

void merge_sort(int *a, int n)
{
    int width;
    for(width = 1; width < n; width = 2 * width)
    {
        int i;
        for(i = 0; i < n; i = i + 2 * width)
        {
            merge(a, i, min(i + width - 1, n - 1), min(i + 2 * width - 1, n - 1) );
        }
    }
}

void printIndexArray()
{    
    int i = 0;    
    for(i = 0; i < 10; i++)
    {    
        printf("i is %d\n", indexes[i]);    
    }
}

int main()
{
    merge_sort(indexes, sizeof(indexes) / sizeof(int) );
    printIndexArray();
    return 0;
}

#4


0  

The short answer is: yes.

简短的回答是:是的。

The long answer is: it is always possible, but the complexity to do it depends heavily on the algorithm you use.

答案很长:它始终是可能的,但这样做的复杂性在很大程度上取决于您使用的算法。

The more complex algorithms like quick-sort, slow-sort, bucket-sort, and straight-radix-sort do not lend themselves to such an enhancement, because they rely on the data being in a consecutive array, that can implicitly be split into subarrays. So, when you detect a duplicate, you cannot easily take it out. Again, it is possible, but certainly not a problem for beginners.

更复杂的算法,如快速排序,慢速排序,桶排序和直接基数排序,不适合这种增强,因为它们依赖于连续数组中的数据,可以隐含地分成子阵。因此,当您检测到重复时,您无法轻易将其取出。同样,这是可能的,但对初学者来说肯定不是问题。

The less complex in-place algorithms like bubble-sort, insertion-sort, and shell-sort make it relatively easy: you can just replace one of the duplicates you detect with a sentinel value that sorts greater than all legal values, and let it rise to the top. After that, you just need to scoop off the cream of sentinel values and you are done.

泡沫排序,插入排序和shell排序等不太复杂的就地算法使它变得相对简单:您可以用一个比所有合法值排序更大的标记值替换您检测到的重复项之一,并让它升到顶峰。在那之后,你只需要舀掉哨兵价值的精华,你就完成了。

The algorithms that really lend themselves to removing duplicates, are the ones that use intermediate arrays that grow/shrink in the process; in these cases you can just shrink or skip growing one of these intermediate arrays when you detect a duplicate. Candidates are merge-sort and heap-sort.

真正有助于删除重复项的算法是使用在过程中增长/缩小的中间数组的算法;在这些情况下,当您检测到重复时,您可以缩小或跳过增长其中一个中间数组。候选人是合并排序和堆排序。

Note, however, that it is more prudent to just sort the array, and eliminate duplicates in a second, separate step. Why? Because eliminating duplicates adds complexity to the inner loop of the sorting algorithm, which is of O(n*log(n)) in most relevant cases. But eliminating duplicates from a sorted array is an O(n) operation, making the split operation faster than the fused one.

但请注意,仅对数组进行排序更为谨慎,并在第二个单独的步骤中消除重复项。为什么?因为消除重复会增加排序算法的内部循环的复杂性,在大多数相关情况下,这是O(n * log(n))。但是从排序数组中消除重复是O(n)操作,使得拆分操作比融合操作更快。

#5


0  

#include <stdio.h>
#include <stdlib.h>

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

size_t undup(int array[], size_t len)
{
size_t src,dst;

if (!len) return 0;
for (src=dst=1; src < len; src++) {
        if (array[dst-1] == array[src]) continue;
        array[dst++] = array[src];
        }
return dst;
}

int comp(const void * elem1, const void * elem2) {

    int f = *((int*) elem1);
    int s = *((int*) elem2);

    if (f > s)     return 1;
    if (f < s)     return -1;

    return 0;
}

void printIndexArray(size_t len) {
    size_t i = 0;
    for (i = 0; i < len; i++) {
        printf("array[%zu] is %d\n", i, indexes[i]);
    }
}

int main() {
    size_t len = 10;
    printf("Before sort\n" );
    printIndexArray(len);

    qsort(indexes, sizeof indexes / sizeof indexes[0], sizeof indexes[0], comp);
    printf("After sort\n" );
    printIndexArray(len);

    len = undup(indexes,10);
    printf("After undup\n" );
    printIndexArray(len);

    return 0;
}