程序中的分段错误错误,用于使用线程计算文件中字的出现次数

时间:2022-07-25 20:59:09

So i have the following problem: Implement a program that gets as arguments a file name followed by words. For each word, create a separate thread that counts its appearances in the given file.Print out the sum of the appearances of all words.

所以我有以下问题:实现一个程序,获取文件名后跟单词作为参数。对于每个单词,创建一个单独的线程,在给定的文件中计算其外观。打印出所有单词的外观总和。

my code is:

我的代码是:

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <pthread.h>

pthread_mutex_t mtx; // used by each of the three threads to prevent  other threads from accessing global_sum during their additions

int global_sum = 0;
typedef struct{
                    char* word;
                    char* filename;
}MyStruct;



void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str; 
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];
    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else
         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){
             if (strcmp(read[j],struc->word)==0)
                count++;
         }

    printf("the no of words is: %d \n",count);  
    pthread_mutex_lock(&mtx); // lock the mutex, to prevent other threads from accessing global_sum
    global_sum += count; // add thread's count result to global_sum
    pthread_mutex_unlock(&mtx); // unlock the mutex, to allow other threads to access the variable
}


int main(int argc, char* argv[]) {
    int i;
    MyStruct str; 

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array 

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];  
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str); 
    }

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

    printf("The global sum is %d.\n", global_sum); // print global sum

    pthread_mutex_destroy(&mtx); // destroy the mutex

    return 0;

}

When I try to run it I get the segmentation fault error. Why is that? Thank you!

当我尝试运行它时,我得到分段错误错误。这是为什么?谢谢!

3 个解决方案

#1


First off, your code is terribly formatted. It's not even consistent. It also does not appear you are compiling with warnings enabled.

首先,您的代码格式非常糟糕。它甚至不一致。您似乎也没有启用警告进行编译。

If you are a university course and they did not tell you how do format the code and compile with warnings, I strongly suggest you ask your tutors what gives.

如果你是大学课程,他们没有告诉你如何格式化代码并编译警告,我强烈建议你问你的导师给出了什么。

If using gcc, add -Wall -Wextra. For coding style, I recommend stealing one either from Linux or FreeBSD. There are various editors which format the code for you, including real editors like vim (which is worth trying out even though it may look harsh).

如果使用gcc,请添加-Wall -Wextra。对于编码风格,我建议从Linux或FreeBSD中窃取一个。有各种编辑器为您编写代码格式,包括像vim这样的真正编辑器(即使看起来很苛刻,也值得尝试)。

Your coding style helps you screw yourself over.

您的编码风格可以帮助您克制自己。

void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str;
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];

buf is unused, which you would learn if you had warnings enabled. read is a bad name.

buf未使用,如果您启用了警告,您将学到它。读是一个坏名字。

    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else

Because you don't return (as you should have) you are setting yourself up for a screwup where you execute code you should not. And guess not, your 'else' clause is deffective. You are missing curly braces, so teh for loop below is executed even if file open operation failed.

因为你没有返回(正如你应该的那样),你正在为自己设置一个你不应该执行的代码。猜不是,你的“其他”条款是有缺陷的。您缺少大括号,因此即使文件打开操作失败,也会执行下面的循环。

         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){

10? Seems like a typo as you likely meant 100. This would not have happened if you used sizeof.

10?看起来像一个拼写错误你可能意味着100.如果你使用sizeof,这不会发生。

             if (strcmp(read[j],struc->word)==0)
                count++;
         }

It is unclear what are you doing here. It seems you wanted to do strcmp starting from read[0], read1 and so on. But you read new data which replace stuff in the original buffer and then you advance it by one. This makes no sense whatsoever. Finally, you are doing it wrong anyway. read[j] does not evaluate to an address and once more the compiler would have told you that if you asked it to.

目前还不清楚你在这做什么。看来你想从read [0],read1等开始做strcmp。但是你读取的新数据可以替换原始缓冲区中的内容,然后将其提前一个。这没有任何意义。最后,无论如何你做错了。 read [j]不会计算到一个地址,编译器会再次告诉你,如果你要求它。

strcmp approach is very bad anyway. Try an approach where you try to match the first character and work from there.

无论如何,strcmp方法非常糟糕。尝试一种尝试匹配第一个角色并从那里开始工作的方法。

int main(int argc, char* argv[]) {

Standard misplaced '*'. Use char *argv[] instead.

标准放错了'*'。请改用char * argv []。

    int i;
    MyStruct str;

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array

Highly not recommended. Validate arguments first and then have a dedicated variable which holds an amount of threads. At this point you can allocate an array.

非常不推荐。首先验证参数,然后有一个包含一定数量线程的专用变量。此时,您可以分配一个数组。

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str);
    }

Similarly to threads, save the path somewhere. Referring to it as argv1 is bad style which will come back to bite you. Using argv for words is fine.

与线程类似,将路径保存在某处。把它称为argv1是一种糟糕的风格,它会回来咬你。使用argv进行单词很好。

However, this is wrong in general. You setup a local struct and pass it to a thread, then immediately change it. So what happens is at the end of the day all your threads are counting the same word. But the word they were counting changed along the way.

但是,这一般是错误的。您设置了一个本地结构并将其传递给一个线程,然后立即更改它。所以发生的事情是在一天结束时所有线程都在计算相同的单词。但他们计算的这个词一直在改变。

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

Go figure. You did not have a variable which held an amount of threads and this lead to this inconsistency (argc - 1 vs argc - 2).

去搞清楚。你没有一个持有大量线程的变量,这导致了这种不一致(argc - 1 vs argc - 2)。

In general this issue could have been solved for the most part with proper reading of compiler warnings, and avoided in general if basic good practices were employed.

一般来说,这个问题在很大程度上可以通过正确阅读编译器警告来解决,并且如果采用基本的良好实践,一般可以避免。

Of course bugs happen regardless of that and in such cases you can at the very least narrow them down.

当然,无论如何都会发生错误,在这种情况下,你至少可以缩小范围。

Finally, few words about the general approach. It is unclear what was the point of the exercise. You actually have to force yourself to use anything more than pthread_create and pthread_join. Let's assume the only requirement is to use threads.

最后,关于一般方法的话很少。目前尚不清楚这项运动的重点是什么。实际上你必须强迫自己使用除pthread_create和pthread_join以外的任何东西。我们假设唯一的要求是使用线程。

I don't know if they force you to open the file multiple times or what. Opening and reading stuff multiple times is not only wasteful but opens you up for a situation where the file is replaced and some threads open a different one.

我不知道他们是否强迫你多次打开文件或者是什么。多次打开和读取内容不仅浪费,而且可以打开更换文件并打开不同文件的情况。

An OK solution would open the file once in main. Once open, you would mmap the file and fstat for size. If for some reason you can't use mmap, you would malloc a large enough buffer and read the file.

一个OK解决方案将在main中打开一次文件。打开后,您将mmap文件和fstat的大小。如果由于某种原因你不能使用mmap,你可以malloc足够大的缓冲区并读取文件。

Then all threads can get an address of that buffer, the word to look for and an address to which they should store the counter (each thread gets a different address).

然后所有线程都可以获得该缓冲区的地址,要查找的字以及它们应该存储计数器的地址(每个线程获得不同的地址)。

When all threads exited you loop over to sum the result.

当所有线程退出时,您将循环以对结果求和。

Either way no locking is involved.

无论哪种方式都不涉及锁定。

#2


In main() your two i loops are different

在main()中,你的两个i循环是不同的

for (i=0;i<argc-2;i++){
    ...
    pthread_create(&threads[i], NULL, count, &str); 
}

and then

for (i = 0; i < argc-1; ++i)
    pthread_join(threads[i], NULL);

and in this second loop you are referencing threads[argc-2] which was not created in the first loop.

在第二个循环中,您引用的是未在第一个循环中创建的线程[argc-2]。

#3


reading (upto) 10 characters at a time (probably) will miss some instances of the word being search for.

一次读取(最多)10个字符(可能)会遗漏搜索单词的某些实例。

the strcmp() always starts at the beginning of the 10 characters.

strcmp()始终从10个字符的开头开始。

1) need to look for the target word at any location in the file.

1)需要在文件中的任何位置查找目标字。

2) need to look for the target word at any location in the read-in buffer.

2)需要在读入缓冲区中的任何位置查找目标字。

Suggest:

0) clear input buffer
1) input one char at a time, 
accumulating characters in the input buffer,
2) when a word separator found, (for instance a space or EOF)
3) then check if the word matches the target word.  
4) if matches, increment count.   
5) if EOF, then exit, else goto 0 

#1


First off, your code is terribly formatted. It's not even consistent. It also does not appear you are compiling with warnings enabled.

首先,您的代码格式非常糟糕。它甚至不一致。您似乎也没有启用警告进行编译。

If you are a university course and they did not tell you how do format the code and compile with warnings, I strongly suggest you ask your tutors what gives.

如果你是大学课程,他们没有告诉你如何格式化代码并编译警告,我强烈建议你问你的导师给出了什么。

If using gcc, add -Wall -Wextra. For coding style, I recommend stealing one either from Linux or FreeBSD. There are various editors which format the code for you, including real editors like vim (which is worth trying out even though it may look harsh).

如果使用gcc,请添加-Wall -Wextra。对于编码风格,我建议从Linux或FreeBSD中窃取一个。有各种编辑器为您编写代码格式,包括像vim这样的真正编辑器(即使看起来很苛刻,也值得尝试)。

Your coding style helps you screw yourself over.

您的编码风格可以帮助您克制自己。

void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str;
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];

buf is unused, which you would learn if you had warnings enabled. read is a bad name.

buf未使用,如果您启用了警告,您将学到它。读是一个坏名字。

    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else

Because you don't return (as you should have) you are setting yourself up for a screwup where you execute code you should not. And guess not, your 'else' clause is deffective. You are missing curly braces, so teh for loop below is executed even if file open operation failed.

因为你没有返回(正如你应该的那样),你正在为自己设置一个你不应该执行的代码。猜不是,你的“其他”条款是有缺陷的。您缺少大括号,因此即使文件打开操作失败,也会执行下面的循环。

         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){

10? Seems like a typo as you likely meant 100. This would not have happened if you used sizeof.

10?看起来像一个拼写错误你可能意味着100.如果你使用sizeof,这不会发生。

             if (strcmp(read[j],struc->word)==0)
                count++;
         }

It is unclear what are you doing here. It seems you wanted to do strcmp starting from read[0], read1 and so on. But you read new data which replace stuff in the original buffer and then you advance it by one. This makes no sense whatsoever. Finally, you are doing it wrong anyway. read[j] does not evaluate to an address and once more the compiler would have told you that if you asked it to.

目前还不清楚你在这做什么。看来你想从read [0],read1等开始做strcmp。但是你读取的新数据可以替换原始缓冲区中的内容,然后将其提前一个。这没有任何意义。最后,无论如何你做错了。 read [j]不会计算到一个地址,编译器会再次告诉你,如果你要求它。

strcmp approach is very bad anyway. Try an approach where you try to match the first character and work from there.

无论如何,strcmp方法非常糟糕。尝试一种尝试匹配第一个角色并从那里开始工作的方法。

int main(int argc, char* argv[]) {

Standard misplaced '*'. Use char *argv[] instead.

标准放错了'*'。请改用char * argv []。

    int i;
    MyStruct str;

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array

Highly not recommended. Validate arguments first and then have a dedicated variable which holds an amount of threads. At this point you can allocate an array.

非常不推荐。首先验证参数,然后有一个包含一定数量线程的专用变量。此时,您可以分配一个数组。

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str);
    }

Similarly to threads, save the path somewhere. Referring to it as argv1 is bad style which will come back to bite you. Using argv for words is fine.

与线程类似,将路径保存在某处。把它称为argv1是一种糟糕的风格,它会回来咬你。使用argv进行单词很好。

However, this is wrong in general. You setup a local struct and pass it to a thread, then immediately change it. So what happens is at the end of the day all your threads are counting the same word. But the word they were counting changed along the way.

但是,这一般是错误的。您设置了一个本地结构并将其传递给一个线程,然后立即更改它。所以发生的事情是在一天结束时所有线程都在计算相同的单词。但他们计算的这个词一直在改变。

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

Go figure. You did not have a variable which held an amount of threads and this lead to this inconsistency (argc - 1 vs argc - 2).

去搞清楚。你没有一个持有大量线程的变量,这导致了这种不一致(argc - 1 vs argc - 2)。

In general this issue could have been solved for the most part with proper reading of compiler warnings, and avoided in general if basic good practices were employed.

一般来说,这个问题在很大程度上可以通过正确阅读编译器警告来解决,并且如果采用基本的良好实践,一般可以避免。

Of course bugs happen regardless of that and in such cases you can at the very least narrow them down.

当然,无论如何都会发生错误,在这种情况下,你至少可以缩小范围。

Finally, few words about the general approach. It is unclear what was the point of the exercise. You actually have to force yourself to use anything more than pthread_create and pthread_join. Let's assume the only requirement is to use threads.

最后,关于一般方法的话很少。目前尚不清楚这项运动的重点是什么。实际上你必须强迫自己使用除pthread_create和pthread_join以外的任何东西。我们假设唯一的要求是使用线程。

I don't know if they force you to open the file multiple times or what. Opening and reading stuff multiple times is not only wasteful but opens you up for a situation where the file is replaced and some threads open a different one.

我不知道他们是否强迫你多次打开文件或者是什么。多次打开和读取内容不仅浪费,而且可以打开更换文件并打开不同文件的情况。

An OK solution would open the file once in main. Once open, you would mmap the file and fstat for size. If for some reason you can't use mmap, you would malloc a large enough buffer and read the file.

一个OK解决方案将在main中打开一次文件。打开后,您将mmap文件和fstat的大小。如果由于某种原因你不能使用mmap,你可以malloc足够大的缓冲区并读取文件。

Then all threads can get an address of that buffer, the word to look for and an address to which they should store the counter (each thread gets a different address).

然后所有线程都可以获得该缓冲区的地址,要查找的字以及它们应该存储计数器的地址(每个线程获得不同的地址)。

When all threads exited you loop over to sum the result.

当所有线程退出时,您将循环以对结果求和。

Either way no locking is involved.

无论哪种方式都不涉及锁定。

#2


In main() your two i loops are different

在main()中,你的两个i循环是不同的

for (i=0;i<argc-2;i++){
    ...
    pthread_create(&threads[i], NULL, count, &str); 
}

and then

for (i = 0; i < argc-1; ++i)
    pthread_join(threads[i], NULL);

and in this second loop you are referencing threads[argc-2] which was not created in the first loop.

在第二个循环中,您引用的是未在第一个循环中创建的线程[argc-2]。

#3


reading (upto) 10 characters at a time (probably) will miss some instances of the word being search for.

一次读取(最多)10个字符(可能)会遗漏搜索单词的某些实例。

the strcmp() always starts at the beginning of the 10 characters.

strcmp()始终从10个字符的开头开始。

1) need to look for the target word at any location in the file.

1)需要在文件中的任何位置查找目标字。

2) need to look for the target word at any location in the read-in buffer.

2)需要在读入缓冲区中的任何位置查找目标字。

Suggest:

0) clear input buffer
1) input one char at a time, 
accumulating characters in the input buffer,
2) when a word separator found, (for instance a space or EOF)
3) then check if the word matches the target word.  
4) if matches, increment count.   
5) if EOF, then exit, else goto 0