如何在字符串中查找子字符串的所有出现次数和所有位置?

时间:2021-04-10 19:20:51

I need to find all occurrences and output all positions of a substring in a string.

我需要找到所有出现并输出字符串中子字符串的所有位置。

For example: my string is abaaab, my substring is aa, position is 3 and 4, because in aaa my substr is repeated twice.

例如:我的字符串是abaaab,我的子字符串是aa,position是3和4,因为在aaa中我的substr重复了两次。

I want the position at the end to be printed from right to left, and after the position of substring I want the number of occurrences of my subtring.

我希望最后的位置从右到左打印,在子串的位置后我想要我的子串的出现次数。

I tried to do it and I have this:

我试着这样做,我有这个:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){
    char *str, c;
    int x = 0, y = 1;

    str = (char*)malloc(sizeof(char));

    printf("Inserisci stringa principale : ");

        while (c != '\n') {
        // read the input from keyboard standard input
        c = getc(stdin);

        // re-allocate (resize) memory for character read to be stored
        str = (char*)realloc(str, y * sizeof(char));

        // store read character by making pointer point to c
        str[x] = c;

        x++;
        y++;
        }

    str[x] = '\0'; // at the end append null character to mark end of string

    printf("\nLa stringa inserita : %s", str);

      char *sub, b;
      int w = 0, z = 1;

      sub = (char*)malloc(sizeof(char));

      printf("Immetti sottostringa da cercare : ");

          while (b != '\n') {
            // read the input from keyboard standard input
            b = getc(stdin);

            // re-allocate (resize) memory for character read to be stored
            sub = (char*)realloc(sub, z * sizeof(char));

            // store read character by making pointer point to c
            sub[w] = b;

            w++;
            z++;
          }

      sub[w] = '\0'; // at the end append null character to mark end of string

    char *p1, *p2, *p3;
    int i=0,j=0,flag=0;

      p1 = str;
      p2 = sub;

      for(i = 0; i<strlen(str); i++)
      {
        if(*p1 == *p2)
          {
              p3 = p1;
              for(j = 0;j<strlen(sub);j++)
              {
                if(*p3 == *p2)
                {
                  p3++;p2++;
                } 
                else
                  break;
              }
              p2 = sub;
              if(j == strlen(sub))
              {
                 flag = 1;
                printf("\nSottostringa trovata all'indice : %d\n",i);
              }
          }
        p1++; 
      }
      if(flag==0)
      {
           printf("Sottostringa non trovata");
      }
    free(str);
    free(sub);
    return (0);
    }

But it only shows me the position of the first occurrence, and not the number of occurrences.

但它只显示第一次出现的位置,而不是出现次数。

3 个解决方案

#1


2  

There are multiple problems in your code:

您的代码中存在多个问题:

  • Your string reallocation scheme is incorrect: the space allocated is one byte too short for the string and you never test for memory allocation failure. You could use getline() if your system supports it or at least write a function to factorize the code.

    您的字符串重新分配方案不正确:分配的空间对于字符串来说太短了一个字节,您永远不会测试内存分配失败。如果您的系统支持getline(),或者至少编写一个函数来分解代码,您可以使用getline()。

  • c is unsinitialized the first time you loop test c != '\n': this has undefined behavior.

    第一次循环测试时,c未初始化c!='\ n':这有未定义的行为。

  • Your matching algorithm is too complicated: you use both index values and moving pointers. Use one or the other.

    您的匹配算法太复杂了:您使用索引值和移动指针。使用其中一个。

Here is a simplified version:

这是一个简化版本:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* read an allocated string from stream.
   stop at newline, not included in string.
   Return NULL upon EOF
 */
char *my_getline(FILE *stream) {
    char *line = NULL;
    size_t pos = 0;
    int c;

    while ((c = getc(stream)) != EOF) {
        char *newp = realloc(line, pos + 2);
        if (newp == NULL) {
            free(line);
            return NULL;
        }
        line = newp;
        if (c == '\n')
            break;
        line[pos++] = (char)c;
    }
    if (line) {
        line[pos] = '\0';
    }
    return line;
}

int main(void) {
    char *str, *sub;
    size_t len1, len2, i, count = 0;

    // type the main string
    printf("Inserisci stringa principale :\n");
    str = my_getline(stdin);

    // type the substring to search for
    printf("Immetti sottostringa da cercare :\n");
    sub = my_getline(stdin);

    if (str && sub) {
        len1 = strlen(str);
        len2 = strlen(sub);
        for (i = 0; i + len2 <= len1; i++) {
            if (!memcmp(str + i, sub, len2)) {
                count++;
                // substring found at offset
                printf("Sottostringa trovata all'indice : %zu\n", i);
            }
        }
        if (count == 0) {
            // substring not found
            printf("Sottostringa non trovata\n");
        }
    }
    free(str);
    free(sub);
    return 0;
}

Notes:

  • The above code finds matches for the empty substring at every offset in the search string. Whether matches should be found or not is a question of specification, but this behavior is consistent with that of strstr().

    上面的代码在搜索字符串中的每个偏移处找到空子字符串的匹配项。是否应该找到匹配是一个规范问题,但这种行为与strstr()的行为一致。

  • you could also use standard function strstr() to locate the matches.

    你也可以使用标准函数strstr()来定位匹配。

Here is a version of the main loop using strstr():

这是使用strstr()的主循环版本:

if (str && sub) {
    for (char *p = str; (p = strstr(p, sub)) != NULL; p++) {
        count++;
        // substring found at offset
        printf("Sottostringa trovata all'indice : %tu\n", p - str);
        if (*p == '\0')  /* special case for the empty string */
            break;
    }
    if (count == 0) {
        // substring not found
        printf("Sottostringa non trovata\n");
    }
}

#2


1  

I've checked you code and it seems that your code has problem in the line

我检查了你的代码,似乎你的代码在行中有问题

if(j == strlen(sub))

Since j is starting from 0 it will always be 1 less than the length of the sub string, change your code to

由于j从0开始,它总是比子字符串的长度小1,所以将代码更改为

if(j+1 == strlen(sub))

and it should solve your problem.

它应该解决你的问题。

For number of occurrences you need another variable to count whenever there is a match with the substring, modifying the if block

对于出现次数,只要与子字符串匹配,就需要计算另一个变量,修改if块

if(j+1 == strlen(sub))
{
      flag = 1;
      occurrences+=1;  //declare variable occurrences and initialize it to 0
      printf("\nSottostringa trovata all'indice : %d\n",i);
}

Then after the end of the loop just print the 'occurrences' to get the desired result.

然后在循环结束后打印“出现次数”以获得所需的结果。

Also this is not an efficient way to solve the problem, you can refer to

这也不是解决问题的有效方法,你可以参考

https://www.topcoder.com/community/data-science/data-science-tutorials/introduction-to-string-searching-algorithms/

for better approach.

为了更好的方法。

#3


0  

A trivial way of finding each occurrence is a strstr called in a loop. After each match, let strstr search one position after that where the match has been found:

找到每个事件的一种简单方法是在循环中调用的strstr。在每次比赛之后,让strstr在找到匹配的位置之后搜索一个位置:

int main( ) {

    const char *string = "abaaab";
    const char *toSearch = "aa";
    int nrOfOccurences = 0;
    printf("searching for occurences of '%s' in string '%s':\n", string, toSearch);
    const char* pos = string;
    while (pos) {
        pos = strstr(pos, toSearch);
        if (pos) {
            printf("found occurence at position %td\n", pos-string);
            nrOfOccurences++;
            pos++;  // skip one character
        }
    }
    nrOfOccurences = findRecursive(string, toSearch, 0,0);
    printf("nr of occurences: %d\n", nrOfOccurences);
    return 0;
}

If you need - as somehow stated - to print the occurrences starting from the last one, you could use a recursive function like the following. A comment in the code above shows how to use it:

如果你需要 - 以某种方式说明 - 从最后一个开始打印出现,你可以使用如下的递归函数。上面代码中的注释显示了如何使用它:

int findRecursive(const char* str, const char* toSearch, ptrdiff_t pos, int nrOfOccurences) {

    char *next = strstr(str, toSearch);
    if (next) {
        ptrdiff_t foundPos = pos + next - str;
        nrOfOccurences = findRecursive(next+1, toSearch, foundPos+1, nrOfOccurences+1);
        printf("occurence found at position %td\n", foundPos);
    }
    return nrOfOccurences;
}

#1


2  

There are multiple problems in your code:

您的代码中存在多个问题:

  • Your string reallocation scheme is incorrect: the space allocated is one byte too short for the string and you never test for memory allocation failure. You could use getline() if your system supports it or at least write a function to factorize the code.

    您的字符串重新分配方案不正确:分配的空间对于字符串来说太短了一个字节,您永远不会测试内存分配失败。如果您的系统支持getline(),或者至少编写一个函数来分解代码,您可以使用getline()。

  • c is unsinitialized the first time you loop test c != '\n': this has undefined behavior.

    第一次循环测试时,c未初始化c!='\ n':这有未定义的行为。

  • Your matching algorithm is too complicated: you use both index values and moving pointers. Use one or the other.

    您的匹配算法太复杂了:您使用索引值和移动指针。使用其中一个。

Here is a simplified version:

这是一个简化版本:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* read an allocated string from stream.
   stop at newline, not included in string.
   Return NULL upon EOF
 */
char *my_getline(FILE *stream) {
    char *line = NULL;
    size_t pos = 0;
    int c;

    while ((c = getc(stream)) != EOF) {
        char *newp = realloc(line, pos + 2);
        if (newp == NULL) {
            free(line);
            return NULL;
        }
        line = newp;
        if (c == '\n')
            break;
        line[pos++] = (char)c;
    }
    if (line) {
        line[pos] = '\0';
    }
    return line;
}

int main(void) {
    char *str, *sub;
    size_t len1, len2, i, count = 0;

    // type the main string
    printf("Inserisci stringa principale :\n");
    str = my_getline(stdin);

    // type the substring to search for
    printf("Immetti sottostringa da cercare :\n");
    sub = my_getline(stdin);

    if (str && sub) {
        len1 = strlen(str);
        len2 = strlen(sub);
        for (i = 0; i + len2 <= len1; i++) {
            if (!memcmp(str + i, sub, len2)) {
                count++;
                // substring found at offset
                printf("Sottostringa trovata all'indice : %zu\n", i);
            }
        }
        if (count == 0) {
            // substring not found
            printf("Sottostringa non trovata\n");
        }
    }
    free(str);
    free(sub);
    return 0;
}

Notes:

  • The above code finds matches for the empty substring at every offset in the search string. Whether matches should be found or not is a question of specification, but this behavior is consistent with that of strstr().

    上面的代码在搜索字符串中的每个偏移处找到空子字符串的匹配项。是否应该找到匹配是一个规范问题,但这种行为与strstr()的行为一致。

  • you could also use standard function strstr() to locate the matches.

    你也可以使用标准函数strstr()来定位匹配。

Here is a version of the main loop using strstr():

这是使用strstr()的主循环版本:

if (str && sub) {
    for (char *p = str; (p = strstr(p, sub)) != NULL; p++) {
        count++;
        // substring found at offset
        printf("Sottostringa trovata all'indice : %tu\n", p - str);
        if (*p == '\0')  /* special case for the empty string */
            break;
    }
    if (count == 0) {
        // substring not found
        printf("Sottostringa non trovata\n");
    }
}

#2


1  

I've checked you code and it seems that your code has problem in the line

我检查了你的代码,似乎你的代码在行中有问题

if(j == strlen(sub))

Since j is starting from 0 it will always be 1 less than the length of the sub string, change your code to

由于j从0开始,它总是比子字符串的长度小1,所以将代码更改为

if(j+1 == strlen(sub))

and it should solve your problem.

它应该解决你的问题。

For number of occurrences you need another variable to count whenever there is a match with the substring, modifying the if block

对于出现次数,只要与子字符串匹配,就需要计算另一个变量,修改if块

if(j+1 == strlen(sub))
{
      flag = 1;
      occurrences+=1;  //declare variable occurrences and initialize it to 0
      printf("\nSottostringa trovata all'indice : %d\n",i);
}

Then after the end of the loop just print the 'occurrences' to get the desired result.

然后在循环结束后打印“出现次数”以获得所需的结果。

Also this is not an efficient way to solve the problem, you can refer to

这也不是解决问题的有效方法,你可以参考

https://www.topcoder.com/community/data-science/data-science-tutorials/introduction-to-string-searching-algorithms/

for better approach.

为了更好的方法。

#3


0  

A trivial way of finding each occurrence is a strstr called in a loop. After each match, let strstr search one position after that where the match has been found:

找到每个事件的一种简单方法是在循环中调用的strstr。在每次比赛之后,让strstr在找到匹配的位置之后搜索一个位置:

int main( ) {

    const char *string = "abaaab";
    const char *toSearch = "aa";
    int nrOfOccurences = 0;
    printf("searching for occurences of '%s' in string '%s':\n", string, toSearch);
    const char* pos = string;
    while (pos) {
        pos = strstr(pos, toSearch);
        if (pos) {
            printf("found occurence at position %td\n", pos-string);
            nrOfOccurences++;
            pos++;  // skip one character
        }
    }
    nrOfOccurences = findRecursive(string, toSearch, 0,0);
    printf("nr of occurences: %d\n", nrOfOccurences);
    return 0;
}

If you need - as somehow stated - to print the occurrences starting from the last one, you could use a recursive function like the following. A comment in the code above shows how to use it:

如果你需要 - 以某种方式说明 - 从最后一个开始打印出现,你可以使用如下的递归函数。上面代码中的注释显示了如何使用它:

int findRecursive(const char* str, const char* toSearch, ptrdiff_t pos, int nrOfOccurences) {

    char *next = strstr(str, toSearch);
    if (next) {
        ptrdiff_t foundPos = pos + next - str;
        nrOfOccurences = findRecursive(next+1, toSearch, foundPos+1, nrOfOccurences+1);
        printf("occurence found at position %td\n", foundPos);
    }
    return nrOfOccurences;
}