搜索单词在C中的文本文件中出现的次数

时间:2021-01-13 20:03:32

I am new to C and pointers, so it is still confusing as hell! Below is the code of a function with the main purpose of finding how many times a word appears on a text file. Any help will be appreciated!

我是C和指针的新手,所以它仍然令人困惑!下面是一个函数的代码,主要目的是查找单词出现在文本文件中的次数。任何帮助将不胜感激!

void count_occurrences (int n, FILE *file, Entry *entries) {
    file = fopen("test/flicka.txt", "r");
    if (file != NULL) {
        char buff[LINE_MAX_CHARS];
        int i = 0;
        char * haystack = fgets(buff, 1000, file);
        char * needle = NULL;
        char * p = NULL;
        while (haystack != NULL) {
            for (i; i < n; i++) {
                needle = entries[i].string;
                while ( (p = strstr(haystack, needle)) != NULL) {
                    entries[i].count++;
                    p++;
                }    
            }
            haystack = fgets(buff, 1000, file);
            i = 0;
        }
        fclose(file);
    }
    else {
        printf("File not found!\n");
    }
}

1 个解决方案

#1


0  

The problem with an exercise like this is that the best way of solving the specific problem - a character-based state machine attached to the stream - doesn't scale up to larger problems.

像这样的练习的问题在于解决特定问题的最佳方法 - 连接到流的基于字符的状态机 - 不能扩展到更大的问题。

To do it first way, you maintain a "parse position" which is initially zero. You then call fgetc() in a loop until data runs out and you get EOF. If the character matches the character at the parse position, increment the parse position, if the parse position goes to the end of the string, you have a match, so increment the count. If it doesn't, reset the parse position to zero or one depending on whether the first character matches.

首先,你保持一个“解析位置”,它最初为零。然后在循环中调用fgetc(),直到数据用完并获得EOF。如果字符与解析位置处的字符匹配,则递增解析位置,如果解析位置转到字符串的末尾,则表示匹配,因此递增计数。如果没有,请将解析位置重置为零或一,具体取决于第一个字符是否匹配。

The first way is fast and easy, but inflexible.

第一种方法快速简便,但不灵活。

A more scaleable way is on line-based input. Call fgets with a big buffer if you know lines must be short, or build a "getline" if lines are unbounded. Then call strstr on the line to see if you have a match. If you have a match, you need to increment the pointer and check for another.

更可扩展的方式是基于行的输入。如果您知道线条必须很短,请使用大缓冲区调用fgets,或者如果线条*限则构建“getline”。然后在该行上调用strstr以查看您是否匹配。如果你有匹配,你需要增加指针并检查另一个。

The scaleable way separates the parse from the IO and allows you to search for multiple patterns. Pseudo-code

可扩展的方式将解析与IO分开,并允许您搜索多个模式。伪代码

while(line = getline() )
{
   N += countwords(line, "myword");
}

int countwords(line, word)
{
   ptr = line;
   while(strstr(ptr, word))
   {
     ptr = strstr(ptr, word) + strlen(word); // replace strlen with 1 to allow overlaps 
    answer++; 
   }
}

Obviously you now need to modify the main loop to search for several words, keeping an array of Ns and calling repeated with each word. But it scales up to any sort of pattern matching.

显然你现在需要修改主循环来搜索几个单词,保留一个Ns数组并用每个单词重复调用。但它可以扩展到任何类型的模式匹配。

#1


0  

The problem with an exercise like this is that the best way of solving the specific problem - a character-based state machine attached to the stream - doesn't scale up to larger problems.

像这样的练习的问题在于解决特定问题的最佳方法 - 连接到流的基于字符的状态机 - 不能扩展到更大的问题。

To do it first way, you maintain a "parse position" which is initially zero. You then call fgetc() in a loop until data runs out and you get EOF. If the character matches the character at the parse position, increment the parse position, if the parse position goes to the end of the string, you have a match, so increment the count. If it doesn't, reset the parse position to zero or one depending on whether the first character matches.

首先,你保持一个“解析位置”,它最初为零。然后在循环中调用fgetc(),直到数据用完并获得EOF。如果字符与解析位置处的字符匹配,则递增解析位置,如果解析位置转到字符串的末尾,则表示匹配,因此递增计数。如果没有,请将解析位置重置为零或一,具体取决于第一个字符是否匹配。

The first way is fast and easy, but inflexible.

第一种方法快速简便,但不灵活。

A more scaleable way is on line-based input. Call fgets with a big buffer if you know lines must be short, or build a "getline" if lines are unbounded. Then call strstr on the line to see if you have a match. If you have a match, you need to increment the pointer and check for another.

更可扩展的方式是基于行的输入。如果您知道线条必须很短,请使用大缓冲区调用fgets,或者如果线条*限则构建“getline”。然后在该行上调用strstr以查看您是否匹配。如果你有匹配,你需要增加指针并检查另一个。

The scaleable way separates the parse from the IO and allows you to search for multiple patterns. Pseudo-code

可扩展的方式将解析与IO分开,并允许您搜索多个模式。伪代码

while(line = getline() )
{
   N += countwords(line, "myword");
}

int countwords(line, word)
{
   ptr = line;
   while(strstr(ptr, word))
   {
     ptr = strstr(ptr, word) + strlen(word); // replace strlen with 1 to allow overlaps 
    answer++; 
   }
}

Obviously you now need to modify the main loop to search for several words, keeping an array of Ns and calling repeated with each word. But it scales up to any sort of pattern matching.

显然你现在需要修改主循环来搜索几个单词,保留一个Ns数组并用每个单词重复调用。但它可以扩展到任何类型的模式匹配。