为什么fread循环需要额外的Ctrl + D来用glibc发出EOF信号?

时间:2021-12-29 18:44:34

Normally, to indicate EOF to a program attached to standard input on a Linux terminal, I need to press Ctrl+D once if I just pressed Enter, or twice otherwise. I noticed that the patch command is different, though. With it, I need to press Ctrl+D twice if I just pressed Enter, or three times otherwise. (Doing cat | patch instead doesn't have this oddity. Also, If I press Ctrl+D before typing any real input at all, it doesn't have this oddity.) Digging into patch's source code, I traced this back to the way it loops on fread. Here's a minimal program that does the same thing:

通常,为了向连接到Linux终端上的标准输入的程序指示EOF,如果我只按Enter键,则需要按Ctrl + D一次,否则按两次。我注意到补丁命令不同。有了它,如果我只按Enter键,我需要按两次Ctrl + D,否则按三次。 (做cat | patch反而没有这种奇怪性。另外,如果我在输入任何实际输入之前按下Ctrl + D,它就没有这种奇怪之处。)深入研究补丁的源代码,我将其追溯到它在fread上循环的方式。这是一个做同样事情的最小程序:

#include <stdio.h>

int main(void) {
    char buf[4096];
    size_t charsread;
    while((charsread = fread(buf, 1, sizeof(buf), stdin)) != 0) {
        printf("Read %zu bytes. EOF: %d. Error: %d.\n", charsread, feof(stdin), ferror(stdin));
    }
    printf("Read zero bytes. EOF: %d. Error: %d. Exiting.\n", feof(stdin), ferror(stdin));
    return 0;
}

When compiling and running the above program exactly as-is, here's a timeline of events:

在完全按原样编译和运行上述程序时,这是事件的时间表:

  1. My program calls fread.
  2. 我的程序称为fread。

  3. fread calls the read system call.
  4. fread调用read系统调用。

  5. I type "asdf".
  6. 我输入“asdf”。

  7. I press Enter.
  8. 我按Enter键。

  9. The read system call returns 5.
  10. 读取系统调用返回5。

  11. fread calls the read system call again.
  12. fread再次调用read系统调用。

  13. I press Ctrl+D.
  14. 我按Ctrl + D.

  15. The read system call returns 0.
  16. 读取系统调用返回0。

  17. fread returns 5.
  18. fread返回5。

  19. My program prints Read 5 bytes. EOF: 1. Error: 0.
  20. 我的程序打印读取5个字节。 EOF:1。错误:0。

  21. My program calls fread again.
  22. 我的程序再次调用fread。

  23. fread calls the read system call.
  24. fread调用read系统调用。

  25. I press Ctrl+D again.
  26. 我再次按Ctrl + D.

  27. The read system call returns 0.
  28. 读取系统调用返回0。

  29. fread returns 0.
  30. fread返回0。

  31. My program prints Read zero bytes. EOF: 1. Error: 0. Exiting.
  32. 我的程序打印读零字节。 EOF:1。错误:0。退出。

Why does this means of reading stdin have this behavior, unlike the way that every other program seems to read it? Is this a bug in patch? How should this kind of loop be written to avoid this behavior?

为什么这种读取stdin的方法有这种行为,不像其他程序似乎读取它的方式?这是修补程序中的错误吗?如何编写这种循环以避免这种行为?

UPDATE: This seems to be related to libc. I originally experienced it on glibc 2.23-0ubuntu3 from Ubuntu 16.04. @Barmar noted in the comments that it doesn't happen on macOS. After hearing this, I tried compiling the same program against musl 1.1.9-1, also from Ubuntu 16.04, and it didn't have this problem. On musl, the sequence of events has steps 12 through 14 removed, which is why it doesn't have the problem, but is otherwise the same (except for the irrelevant detail of readv in place of read).

更新:这似乎与libc有关。我最初是在Ubuntu 16.04上的glibc 2.23-0ubuntu3上体验过的。 @Barmar在评论中指出,它不会发生在macOS上。听到这个之后,我尝试编译同样的程序对抗musl 1.1.9-1,也来自Ubuntu 16.04,它没有这个问题。在musl上,事件序列已经删除了步骤12到14,这就是为什么它没有问题,但在其他方面是相同的(除了readv的无关细节代替读取)。

Now, the question becomes: is glibc wrong in its behavior, or is patch wrong in assuming that its libc won't have this behavior?

现在,问题变成:glibc的行为是错误的,还是假设它的libc不会有这种行为的错误?

1 个解决方案

#1


7  

I've managed to confirm that this is due to an unambiguous bug in glibc versions prior to 2.28 (commit 2cc7bad). Relevant quotes from the C standard:

我已经设法确认这是由于2.28之前的glibc版本中的明确错误(提交2cc7bad)。 C标准的相关引用:

The byte input/output functions — those functions described in this subclause that perform input/output: [...], fread

字节输入/输出功能 - 本子条款中描述的执行输入/输出的功能:[...],fread

The byte input functions read characters from the stream as if by successive calls to the fgetc function.

字节输入函数从流中读取字符,就好像通过连续调用fgetc函数一样。

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream.

如果设置了流的文件结束指示符,或者流位于文件结尾,则设置流的文件结束指示符,并且fgetc函数返回EOF。否则,fgetc函数返回stream指向的输入流中的下一个字符。

(emphasis on "or" mine)

(强调“或”我的)

The following program demonstrates the bug with fgetc:

以下程序演示了fgetc的错误:

#include <stdio.h>

int main(void) {
    while(fgetc(stdin) != EOF) {
        puts("Read and discarded a character from stdin");
    }
    puts("fgetc(stdin) returned EOF");
    if(!feof(stdin)) {
        /* Included only for completeness. Doesn't occur in my testing. */
        puts("Standard violation! After fgetc returned EOF, the end-of-file indicator wasn't set");
        return 1;
    }
    if(fgetc(stdin) != EOF) {
        /* This happens with glibc in my testing. */
        puts("Standard violation! When fgetc was called with the end-of-file indicator set, it didn't return EOF");
        return 1;
    }
    /* This happens with musl in my testing. */
    puts("No standard violation detected");
    return 0;
}

To demonstrate the bug:

为了证明这个bug:

  1. Compile the program and execute it
  2. 编译程序并执行它

  3. Press Ctrl+D
  4. Press Enter

The exact bug is that if the end-of-file stream indicator is set, but the stream is not at end-of-file, glibc's fgetc will return the next character from the stream, rather than EOF as the standard requires.

确切的错误是,如果设置了文件结束流指示符,但流不在文件末尾,则glibc的fgetc将返回流中的下一个字符,而不是标准要求的EOF。

Since fread is defined in terms of fgetc, this is the cause of what I originally saw. It's previously been reported as glibc bug #1190 and has been fixed since commit 2cc7bad in February 2018, which landed in glibc 2.28 in August 2018.

由于fread是根据fgetc定义的,这是我最初看到的原因。它之前被报道为glibc bug#1190并且自2018年2月提交2cc7bad以来已经修复,2018年8月降至glibc 2.28。

#1


7  

I've managed to confirm that this is due to an unambiguous bug in glibc versions prior to 2.28 (commit 2cc7bad). Relevant quotes from the C standard:

我已经设法确认这是由于2.28之前的glibc版本中的明确错误(提交2cc7bad)。 C标准的相关引用:

The byte input/output functions — those functions described in this subclause that perform input/output: [...], fread

字节输入/输出功能 - 本子条款中描述的执行输入/输出的功能:[...],fread

The byte input functions read characters from the stream as if by successive calls to the fgetc function.

字节输入函数从流中读取字符,就好像通过连续调用fgetc函数一样。

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream.

如果设置了流的文件结束指示符,或者流位于文件结尾,则设置流的文件结束指示符,并且fgetc函数返回EOF。否则,fgetc函数返回stream指向的输入流中的下一个字符。

(emphasis on "or" mine)

(强调“或”我的)

The following program demonstrates the bug with fgetc:

以下程序演示了fgetc的错误:

#include <stdio.h>

int main(void) {
    while(fgetc(stdin) != EOF) {
        puts("Read and discarded a character from stdin");
    }
    puts("fgetc(stdin) returned EOF");
    if(!feof(stdin)) {
        /* Included only for completeness. Doesn't occur in my testing. */
        puts("Standard violation! After fgetc returned EOF, the end-of-file indicator wasn't set");
        return 1;
    }
    if(fgetc(stdin) != EOF) {
        /* This happens with glibc in my testing. */
        puts("Standard violation! When fgetc was called with the end-of-file indicator set, it didn't return EOF");
        return 1;
    }
    /* This happens with musl in my testing. */
    puts("No standard violation detected");
    return 0;
}

To demonstrate the bug:

为了证明这个bug:

  1. Compile the program and execute it
  2. 编译程序并执行它

  3. Press Ctrl+D
  4. Press Enter

The exact bug is that if the end-of-file stream indicator is set, but the stream is not at end-of-file, glibc's fgetc will return the next character from the stream, rather than EOF as the standard requires.

确切的错误是,如果设置了文件结束流指示符,但流不在文件末尾,则glibc的fgetc将返回流中的下一个字符,而不是标准要求的EOF。

Since fread is defined in terms of fgetc, this is the cause of what I originally saw. It's previously been reported as glibc bug #1190 and has been fixed since commit 2cc7bad in February 2018, which landed in glibc 2.28 in August 2018.

由于fread是根据fgetc定义的,这是我最初看到的原因。它之前被报道为glibc bug#1190并且自2018年2月提交2cc7bad以来已经修复,2018年8月降至glibc 2.28。