
时间:2022-10-27 07:42:02

I have an array of strings in C and an integer indicating how many strings are in the array.


char *strarray[MAX];  
int strcount;

In this array, the highest index (where 10 is higher than 0) is the most recent item added and the lowest index is the most distant item added. The order of items within the array matters.


I need a quick way to check the array for duplicates, remove all but the highest index duplicate, and collapse the array.


For example:


strarray[0] = "Line 1"; 
strarray[1] = "Line 2"; 
strarray[2] = "Line 3"; 
strarray[3] = "Line 2"; 
strarray[4] = "Line 4";

would become:


strarray[0] = "Line 1"; 
strarray[1] = "Line 3"; 
strarray[2] = "Line 2"; 
strarray[3] = "Line 4";

Index 1 of the original array was removed and indexes 2, 3, and 4 slid downwards to fill the gap.


I have one idea of how to do it. It is untested and I am currently attempting to code it but just from my faint understanding, I am sure this is a horrendous algorithm.


The algorithm presented below would be ran every time a new string is added to the strarray.


For the interest of showing that I am trying, I will include my proposed algorithm below:


  1. Search entire strarray for match to str
  2. 搜索整个strarray以匹配str
  3. If no match, do nothing
  4. 如果没有匹配,就什么都不做。
  5. If match found, put str in strarray
  6. 如果找到匹配,将str放入strarray
  7. Now we have a strarray with a max of 1 duplicate entry
  8. 现在,我们有一个带有最多1个重复项的strarray。
  9. Add highest index strarray string to lowest index of temporary string array
  10. 将索引最高的strarray字符串添加到临时字符串数组的最低索引中
  11. Continue downwards into strarray and check each element
  12. 继续向下进入strarray并检查每个元素
  13. If duplicate found, skip it
  14. 如果找到副本,跳过它
  15. If not, add it to the next highest index of the temporary string array
  16. 如果不是,则将其添加到临时字符串数组的下一个最高索引中
  17. Reverse temporary string array and copy to strarray
  18. 反转临时字符串数组并复制到strarray

Once again, this is untested (I am currently implementing it now). I just hope someone out there will have a much better solution.


The order of items is important and the code must utilize the C language (not C++). The lowest index duplicates should be removed and the single highest index kept.


Thank you!


4 个解决方案



The typical efficient unique function is to:


  1. Sort the given array.
  2. 给定的数组进行排序。
  3. Verify that consecutive runs of the same item are setup so that only one remains.
  4. 验证相同项目的连续运行是设置的,以便只保留一个。

I believe you can use qsort in combination with strcmp to accomplish the first part; writing an efficient remove would be all on you though.


Unfortunately I don't have specific ideas here; this is kind of a grey area for me because I'm usually using C++, where this would be a simple:


std::vector<std::string> src;
std::sort(src.begin(), src.end());
src.remove(std::unique(src.begin(), src.end()), src.end);

I know you can't use C++, but the implementation should essentially be the same.


Because you need to save the original order, you can have something like:


typedef struct
    int originalPosition;
    char * string;
} tempUniqueEntry;

Do your first sort with respect to string, remove unique sets of elements on the sorted set, then resort with respect to originalPosition. This way you still get O(n lg n) performance, yet you don't lose the original order.


EDIT2: Simple C implementation example of std::unique:


tempUniqueEntry* unique ( tempUniqueEntry * first, tempUniqueEntry * last )
  tempUniqueEntry *result=first;
  while (++first != last)
    if (strcmp(result->string,first->string))
  return ++result;



I don't quite understand your proposed algorithm (I don't understand what it means to add a string to an index in step 5), but what I would do is:


unsigned int i;
for (i = n; i > 0; i--)
    unsigned int j;

    if (strarray[i - 1] == NULL)

    for (j = i - 1; j > 0; j--)
        if (strcmp(strarray[i - 1], strarray[j - 1]) == 0)
            strarray[j - 1] = NULL;

Then you just need to filter the null pointers out of your array (which I'll leave as an exercise).


A different approach would be to iterate backwards over the array and to insert each item into a (balanced) binary search tree as you go. If the item is already in the binary search tree, flag the array item (such as setting the array element to NULL) and move on. When you've processed the entire array, filter out the flagged elements as before. This would have slightly more overhead and would consume more space, but its running time would be O(n log n) instead of O(n^2).

另一种方法是遍历数组,并在执行时将每个条目插入到(平衡的)二进制搜索树中。如果项已经在二叉搜索树中,标记数组项(例如将数组元素设置为NULL),然后继续。处理完整个数组后,像以前一样过滤掉标记的元素。这稍微开销和将消耗更多的空间,但它的运行时间将是O(n log n),而不是O(n ^ 2)。



Sort the array with an algorithm like qsort (man 3 qsort in the terminal to see how it should be used) and then use the function strcmp to compare the strings and find duplicates

使用qsort(在终端中使用man3 qsort查看它应该如何使用)这样的算法对数组进行排序,然后使用函数strcmp来比较字符串并找到副本

If you want to mantain the original order you could use a O(N^2) complexity algorithm nesting two for, the first each time pick an element to compare to the other and the second for will be used to scan the rest of the array to find if the chosen element is a duplicate.

如果你想好好原始订单您可以使用O(N ^ 2)复杂性算法嵌套两个,第一个每次选择一个元素比较,第二个将用于扫描其他数组找到如果选择元素是重复的。



Can you control the input as it is going into the array? If so, just do something like this:


int addToArray(const char * toadd, char * strarray[], int strcount)
    const int toaddlen = strlen(toadd);

    // Add new string to end.
    // Remember to add one for the \0 terminator.
    strarray[strcount] = malloc(sizeof(char) * (toaddlen + 1));
    strncpy(strarray[strcount], toadd, toaddlen + 1);

    // Search for a duplicate.
    // Note that we are cutting the new array short by one.
    for(int i = 0; i < strcount; ++i)
        if (strncmp(strarray[i], toaddlen + 1) == 0)
            // Found duplicate.
            // Remove it and compact.
            // Note use of new array size here.  
            for(int k = i + 1; k < strcount + 1; ++k)
                strarray[i] = strarray[k];

            strarray[strcount] = null;
            return strcount;

    // No duplicate found.
    return (strcount + 1);

You can always use the above function looping over the elements of an existing array, building a new array without duplicates.


PS: If you are doing this type of operation a lot, you should move away from an array as your storage structure, and used a linked list instead. They are much more efficient for removing elements from a location other than the end.




The typical efficient unique function is to:


  1. Sort the given array.
  2. 给定的数组进行排序。
  3. Verify that consecutive runs of the same item are setup so that only one remains.
  4. 验证相同项目的连续运行是设置的,以便只保留一个。

I believe you can use qsort in combination with strcmp to accomplish the first part; writing an efficient remove would be all on you though.


Unfortunately I don't have specific ideas here; this is kind of a grey area for me because I'm usually using C++, where this would be a simple:


std::vector<std::string> src;
std::sort(src.begin(), src.end());
src.remove(std::unique(src.begin(), src.end()), src.end);

I know you can't use C++, but the implementation should essentially be the same.


Because you need to save the original order, you can have something like:


typedef struct
    int originalPosition;
    char * string;
} tempUniqueEntry;

Do your first sort with respect to string, remove unique sets of elements on the sorted set, then resort with respect to originalPosition. This way you still get O(n lg n) performance, yet you don't lose the original order.


EDIT2: Simple C implementation example of std::unique:


tempUniqueEntry* unique ( tempUniqueEntry * first, tempUniqueEntry * last )
  tempUniqueEntry *result=first;
  while (++first != last)
    if (strcmp(result->string,first->string))
  return ++result;



I don't quite understand your proposed algorithm (I don't understand what it means to add a string to an index in step 5), but what I would do is:


unsigned int i;
for (i = n; i > 0; i--)
    unsigned int j;

    if (strarray[i - 1] == NULL)

    for (j = i - 1; j > 0; j--)
        if (strcmp(strarray[i - 1], strarray[j - 1]) == 0)
            strarray[j - 1] = NULL;

Then you just need to filter the null pointers out of your array (which I'll leave as an exercise).


A different approach would be to iterate backwards over the array and to insert each item into a (balanced) binary search tree as you go. If the item is already in the binary search tree, flag the array item (such as setting the array element to NULL) and move on. When you've processed the entire array, filter out the flagged elements as before. This would have slightly more overhead and would consume more space, but its running time would be O(n log n) instead of O(n^2).

另一种方法是遍历数组,并在执行时将每个条目插入到(平衡的)二进制搜索树中。如果项已经在二叉搜索树中,标记数组项(例如将数组元素设置为NULL),然后继续。处理完整个数组后,像以前一样过滤掉标记的元素。这稍微开销和将消耗更多的空间,但它的运行时间将是O(n log n),而不是O(n ^ 2)。



Sort the array with an algorithm like qsort (man 3 qsort in the terminal to see how it should be used) and then use the function strcmp to compare the strings and find duplicates

使用qsort(在终端中使用man3 qsort查看它应该如何使用)这样的算法对数组进行排序,然后使用函数strcmp来比较字符串并找到副本

If you want to mantain the original order you could use a O(N^2) complexity algorithm nesting two for, the first each time pick an element to compare to the other and the second for will be used to scan the rest of the array to find if the chosen element is a duplicate.

如果你想好好原始订单您可以使用O(N ^ 2)复杂性算法嵌套两个,第一个每次选择一个元素比较,第二个将用于扫描其他数组找到如果选择元素是重复的。



Can you control the input as it is going into the array? If so, just do something like this:


int addToArray(const char * toadd, char * strarray[], int strcount)
    const int toaddlen = strlen(toadd);

    // Add new string to end.
    // Remember to add one for the \0 terminator.
    strarray[strcount] = malloc(sizeof(char) * (toaddlen + 1));
    strncpy(strarray[strcount], toadd, toaddlen + 1);

    // Search for a duplicate.
    // Note that we are cutting the new array short by one.
    for(int i = 0; i < strcount; ++i)
        if (strncmp(strarray[i], toaddlen + 1) == 0)
            // Found duplicate.
            // Remove it and compact.
            // Note use of new array size here.  
            for(int k = i + 1; k < strcount + 1; ++k)
                strarray[i] = strarray[k];

            strarray[strcount] = null;
            return strcount;

    // No duplicate found.
    return (strcount + 1);

You can always use the above function looping over the elements of an existing array, building a new array without duplicates.


PS: If you are doing this type of operation a lot, you should move away from an array as your storage structure, and used a linked list instead. They are much more efficient for removing elements from a location other than the end.
