计算ASCII文件中换行符的最简单方法是什么?

时间:2022-01-13 04:31:53

Which is the fastest way to get the lines of an ASCII file?

获取ASCII文件行的最快方法是哪种?

5 个解决方案

#1


19  

Normally you read files in C using fgets. You can also use scanf("%[^\n]"), but quite a few people reading the code are likely to find that confusing and foreign.

通常,您使用fgets读取C中的文件。你也可以使用scanf(“%[^ \ n]”),但很多读这些代码的人很可能会发现令人困惑和异国情调。

Edit: on the other hand, if you really do just want to count lines, a slightly modified version of the scanf approach can work quite nicely:

编辑:另一方面,如果你真的想要计算行数,那么scanf方法的略微修改版本可以很好地工作:

while (EOF != (scanf("%*[^\n]"), scanf("%*c"))) 
    ++lines;

The advantage of this is that with the '*' in each conversion, scanf reads and matches the input, but does nothing with the result. That means we don't have to waste memory on a large buffer to hold the content of a line that we don't care about (and still take a chance of getting a line that's even larger than that, so our count ends up wrong unless we got to even more work to figure out whether the input we read ended with a newline).

这样做的好处是,在每次转换中使用'*',scanf读取并匹配输入,但对结果不做任何操作。这意味着我们不必在大缓冲区上浪费内存来保存我们不关心的行的内容(并且仍然有可能获得比这更大的行,所以我们的计数结果错了除非我们需要做更多的工作来弄清楚我们读取的输入是否以换行结束)。

Unfortunately, we do have to break up the scanf into two pieces like this. scanf stops scanning when a conversion fails, and if the input contains a blank line (two consecutive newlines) we expect the first conversion to fail. Even if that fails, however, we want the second conversion to happen, to read the next newline and move on to the next line. Therefore, we attempt the first conversion to "eat" the content of the line, and then do the %c conversion to read the newline (the part we really care about). We continue doing both until the second call to scanf returns EOF (which will normally be at the end of the file, though it can also happen in case of something like a read error).

不幸的是,我们必须将scanf分解为两个这样的部分。 scanf在转换失败时停止扫描,如果输入包含空行(两个连续的换行符),我们预计第一次转换将失败。然而,即使失败了,我们也希望第二次转换发生,读取下一个换行符并转到下一行。因此,我们尝试第一次转换“吃”线的内容,然后执行%c转换以读取换行符(我们真正关心的部分)。我们继续执行这两个操作,直到第二次调用scanf返回EOF(通常位于文件的末尾,尽管在读取错误的情况下也会发生这种情况)。

Edit2: Of course, there is another possibility that's (at least arguably) simpler and easier to understand:

编辑2:当然,还有另一种可能性(至少可以说是)更简单易懂:

int ch;

while (EOF != (ch=getchar()))
    if (ch=='\n')
        ++lines;

The only part of this that some people find counterintuitive is that ch must be defined as an int, not a char for the code to work correctly.

有些人发现违反直觉的唯一部分是ch必须定义为int,而不是char才能使代码正常工作。

#2


4  

Here's a solution based on fgetc() which will work for lines of any length and doesn't require you to allocate a buffer.

这是一个基于fgetc()的解决方案,它适用于任何长度的行,不需要您分配缓冲区。

#include <stdio.h>

int main()
{
    FILE                *fp = stdin;    /* or use fopen to open a file */
    int                 c;              /* Nb. int (not char) for the EOF */
    unsigned long       newline_count = 0;

        /* count the newline characters */
    while ( (c=fgetc(fp)) != EOF ) {
        if ( c == '\n' )
            newline_count++;
    }

    printf("%lu newline characters\n", newline_count);
    return 0;
}

#3


2  

Maybe I'm missing something, but why not simply:

也许我错过了什么,但为什么不简单:

#include <stdio.h>
int main(void) {
  int n = 0;
  int c;
  while ((c = getchar()) != EOF) {
    if (c == '\n')
      ++n;
  }
  printf("%d\n", n);
}

if you want to count partial lines (i.e. [^\n]EOF):

如果你想计算部分线(即[^ \ n] EOF):

#include <stdio.h>
int main(void) {
  int n = 0;
  int pc = EOF;
  int c;
  while ((c = getchar()) != EOF) {
    if (c == '\n')
      ++n;
    pc = c;
  }
  if (pc != EOF && pc != '\n')
    ++n;
  printf("%d\n", n);
}

#4


2  

Common, why You compare all characters? It is very slow. In 10MB file it is ~3s.
Under solution is faster.

常见,你为什么比较所有人物?这很慢。在10MB文件中它是~3s。解决方案更快。

unsigned long count_lines_of_file(char *file_patch) {
    FILE *fp = fopen(file_patch, "r");
    unsigned long line_count = 0;

    if(fp == NULL){
        return 0;
    }
    while ( fgetline(fp) )
        line_count++;

    fclose(fp);
    return line_count;
}

#5


1  

What about this?

那这个呢?

#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE 4096

int main(int argc, char** argv)
{
    int count;
    int bytes;
    FILE* f;
    char buffer[BUFFER_SIZE + 1];
    char* ptr;

    if (argc != 2 || !(f = fopen(argv[1], "r")))
    {
        return -1;
    }

    count = 0;
    while(!feof(f))
    {
        bytes = fread(buffer, sizeof(char), BUFFER_SIZE, f);
        if (bytes <= 0)
        {
            return -1;
        }

        buffer[bytes] = '\0';
        for (ptr = buffer; ptr; ptr = strchr(ptr, '\n'))
        {
            ++count;
            ++ptr;
        }
    }

    fclose(f);

    printf("%d\n", count - 1);

    return 0;
}

#1


19  

Normally you read files in C using fgets. You can also use scanf("%[^\n]"), but quite a few people reading the code are likely to find that confusing and foreign.

通常,您使用fgets读取C中的文件。你也可以使用scanf(“%[^ \ n]”),但很多读这些代码的人很可能会发现令人困惑和异国情调。

Edit: on the other hand, if you really do just want to count lines, a slightly modified version of the scanf approach can work quite nicely:

编辑:另一方面,如果你真的想要计算行数,那么scanf方法的略微修改版本可以很好地工作:

while (EOF != (scanf("%*[^\n]"), scanf("%*c"))) 
    ++lines;

The advantage of this is that with the '*' in each conversion, scanf reads and matches the input, but does nothing with the result. That means we don't have to waste memory on a large buffer to hold the content of a line that we don't care about (and still take a chance of getting a line that's even larger than that, so our count ends up wrong unless we got to even more work to figure out whether the input we read ended with a newline).

这样做的好处是,在每次转换中使用'*',scanf读取并匹配输入,但对结果不做任何操作。这意味着我们不必在大缓冲区上浪费内存来保存我们不关心的行的内容(并且仍然有可能获得比这更大的行,所以我们的计数结果错了除非我们需要做更多的工作来弄清楚我们读取的输入是否以换行结束)。

Unfortunately, we do have to break up the scanf into two pieces like this. scanf stops scanning when a conversion fails, and if the input contains a blank line (two consecutive newlines) we expect the first conversion to fail. Even if that fails, however, we want the second conversion to happen, to read the next newline and move on to the next line. Therefore, we attempt the first conversion to "eat" the content of the line, and then do the %c conversion to read the newline (the part we really care about). We continue doing both until the second call to scanf returns EOF (which will normally be at the end of the file, though it can also happen in case of something like a read error).

不幸的是,我们必须将scanf分解为两个这样的部分。 scanf在转换失败时停止扫描,如果输入包含空行(两个连续的换行符),我们预计第一次转换将失败。然而,即使失败了,我们也希望第二次转换发生,读取下一个换行符并转到下一行。因此,我们尝试第一次转换“吃”线的内容,然后执行%c转换以读取换行符(我们真正关心的部分)。我们继续执行这两个操作,直到第二次调用scanf返回EOF(通常位于文件的末尾,尽管在读取错误的情况下也会发生这种情况)。

Edit2: Of course, there is another possibility that's (at least arguably) simpler and easier to understand:

编辑2:当然,还有另一种可能性(至少可以说是)更简单易懂:

int ch;

while (EOF != (ch=getchar()))
    if (ch=='\n')
        ++lines;

The only part of this that some people find counterintuitive is that ch must be defined as an int, not a char for the code to work correctly.

有些人发现违反直觉的唯一部分是ch必须定义为int,而不是char才能使代码正常工作。

#2


4  

Here's a solution based on fgetc() which will work for lines of any length and doesn't require you to allocate a buffer.

这是一个基于fgetc()的解决方案,它适用于任何长度的行,不需要您分配缓冲区。

#include <stdio.h>

int main()
{
    FILE                *fp = stdin;    /* or use fopen to open a file */
    int                 c;              /* Nb. int (not char) for the EOF */
    unsigned long       newline_count = 0;

        /* count the newline characters */
    while ( (c=fgetc(fp)) != EOF ) {
        if ( c == '\n' )
            newline_count++;
    }

    printf("%lu newline characters\n", newline_count);
    return 0;
}

#3


2  

Maybe I'm missing something, but why not simply:

也许我错过了什么,但为什么不简单:

#include <stdio.h>
int main(void) {
  int n = 0;
  int c;
  while ((c = getchar()) != EOF) {
    if (c == '\n')
      ++n;
  }
  printf("%d\n", n);
}

if you want to count partial lines (i.e. [^\n]EOF):

如果你想计算部分线(即[^ \ n] EOF):

#include <stdio.h>
int main(void) {
  int n = 0;
  int pc = EOF;
  int c;
  while ((c = getchar()) != EOF) {
    if (c == '\n')
      ++n;
    pc = c;
  }
  if (pc != EOF && pc != '\n')
    ++n;
  printf("%d\n", n);
}

#4


2  

Common, why You compare all characters? It is very slow. In 10MB file it is ~3s.
Under solution is faster.

常见,你为什么比较所有人物?这很慢。在10MB文件中它是~3s。解决方案更快。

unsigned long count_lines_of_file(char *file_patch) {
    FILE *fp = fopen(file_patch, "r");
    unsigned long line_count = 0;

    if(fp == NULL){
        return 0;
    }
    while ( fgetline(fp) )
        line_count++;

    fclose(fp);
    return line_count;
}

#5


1  

What about this?

那这个呢?

#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE 4096

int main(int argc, char** argv)
{
    int count;
    int bytes;
    FILE* f;
    char buffer[BUFFER_SIZE + 1];
    char* ptr;

    if (argc != 2 || !(f = fopen(argv[1], "r")))
    {
        return -1;
    }

    count = 0;
    while(!feof(f))
    {
        bytes = fread(buffer, sizeof(char), BUFFER_SIZE, f);
        if (bytes <= 0)
        {
            return -1;
        }

        buffer[bytes] = '\0';
        for (ptr = buffer; ptr; ptr = strchr(ptr, '\n'))
        {
            ++count;
            ++ptr;
        }
    }

    fclose(f);

    printf("%d\n", count - 1);

    return 0;
}