I'm programming C on Windows(system language is Japanese), and I have a problem about EOF of binary and ascii files.
我在Windows上编程C(系统语言是日语),我有一个关于二进制和ascii文件的EOF的问题。
I asked this question last week, a kind guy helped me, but I still can't really understand how the program works when reading a binary or an ascii file.
上周我问了这个问题,一个善良的人帮助了我,但我仍然不能真正理解这个程序在读取二进制文件或ascii文件时是如何工作的。
I did the following test:
我做了如下测试:
Test1:
Test1:
int oneChar;
iFile = fopen("myFile.tar.gz", "rb");
while ((oneChar = fgetc(iFile)) != EOF) {
printf("%d ", oneChar);
}
Test2:
Test2:
int oneChar;
iFile = fopen("myFile.tar.gz", "r");
while ((oneChar = fgetc(iFile)) != EOF) {
printf("%d ", oneChar);
}
In the test1 case, things worked perfectly for both binary and ascii files. But in test2, program stopped reading when it encountered 0x1A in a binary file. (Does this mean that 1A == EOF?) ASCII table tells me that 1A is a control character called substitute (whatever that means...) And when I printf("%d", EOF), however, it gave me -1...
在test1中,二进制文件和ascii文件的工作都很好。但是在test2中,程序在遇到二进制文件中的0x1A时停止了读取。(这是否意味着1A = EOF?)ASCII表告诉我1A是一个叫做替换的控制字符(无论那是什么意思…)当我打印f("%d" EOF)时,结果是-1…
I also found this question which tells me that the OS knows exactly where a file ends, so I don't really need to find EOF in the file, because EOF is out of the range of a byte (what about 1A?)
我还发现了这个问题,它告诉我操作系统知道文件的确切位置,所以我不需要在文件中找到EOF,因为EOF不在一个字节的范围内(1A呢?)
Can someone clear things up a little for me? Thanks in advance.
谁能帮我把事情弄清楚一点吗?提前谢谢。
3 个解决方案
#1
7
This is a Windows-specific trick for text files: SUB
character, which is represented by Ctrl+Z sequence, is interpreted as EOF
by fgetc
. You do not have to have 1A
in your text file in order to get an EOF
back from fgetc
, though: once you reach the actual end of file, EOF
would be returned.
这是针对文本文件的一个特定于windows的技巧:子字符,由Ctrl+Z序列表示,被fgetc解释为EOF。不过,为了从fgetc获得EOF,不需要在文本文件中包含1A:一旦到达文件的实际末端,就会返回EOF。
The standard does not define 1A
as the char
value to represent an EOF
. The constant for EOF
is of type int
, with a negative value outside the range of unsigned char
. In fact, the reason why fgetc
returns an int
, not char
, is to let it return a special value for EOF
.
标准没有将1A定义为表示EOF的char值。EOF的常量类型为int,在无符号字符范围之外的值为负值。实际上,fgetc返回int(而不是char)的原因是让它返回EOF的一个特殊值。
#2
5
The convention of ending a file with Ctrl-Z originated with CP/M, a very old operating system for 8080/Z80 microcomputers. Its file system did not keep track of file sizes down to the byte level, only to the 128-byte sector level, so there needed to be another way to mark the end-of-file.
使用Ctrl-Z结束文件的惯例起源于CP/M,这是一种用于8080/Z80微型计算机的非常古老的操作系统。它的文件系统没有将文件大小跟踪到字节级别,只跟踪到128字节的扇区级别,因此需要另一种方式来标记文件结束。
Microsoft's DOS was made to be as compatible with CP/M as possible, so it kept the convention when reading text files. By this time the file size was kept by the file system so it wasn't strictly necessary, just retained for backward compatibility.
微软的DOS是为了尽可能地与CP/M兼容,所以它在读取文本文件时保持惯例。此时,文件系统保留了文件大小,因此并不一定要保留文件大小,只是为了向后兼容。
This convention has persisted to the present day in the C and C++ libraries for Windows; when you open a file in text mode, every character is checked for Ctrl-Z and the end-of-file flag is set if it's detected. You're seeing the effects of backwards compatibility taken to an extreme, back to systems that are almost 40 years old.
在Windows的C和c++库中,这种约定一直延续到今天;当您以文本模式打开文件时,将检查每个字符是否按Ctrl-Z,如果检测到文件结束标志,则设置它。你看到的是向后兼容的效果被带到一个极端,回到将近40岁的系统。
#3
0
Found a terrific article that answers all the question! https://latedev.wordpress.com/2012/12/04/all-about-eof/
找到一篇能回答所有问题的好文章!https://latedev.wordpress.com/2012/12/04/all-about-eof/
#1
7
This is a Windows-specific trick for text files: SUB
character, which is represented by Ctrl+Z sequence, is interpreted as EOF
by fgetc
. You do not have to have 1A
in your text file in order to get an EOF
back from fgetc
, though: once you reach the actual end of file, EOF
would be returned.
这是针对文本文件的一个特定于windows的技巧:子字符,由Ctrl+Z序列表示,被fgetc解释为EOF。不过,为了从fgetc获得EOF,不需要在文本文件中包含1A:一旦到达文件的实际末端,就会返回EOF。
The standard does not define 1A
as the char
value to represent an EOF
. The constant for EOF
is of type int
, with a negative value outside the range of unsigned char
. In fact, the reason why fgetc
returns an int
, not char
, is to let it return a special value for EOF
.
标准没有将1A定义为表示EOF的char值。EOF的常量类型为int,在无符号字符范围之外的值为负值。实际上,fgetc返回int(而不是char)的原因是让它返回EOF的一个特殊值。
#2
5
The convention of ending a file with Ctrl-Z originated with CP/M, a very old operating system for 8080/Z80 microcomputers. Its file system did not keep track of file sizes down to the byte level, only to the 128-byte sector level, so there needed to be another way to mark the end-of-file.
使用Ctrl-Z结束文件的惯例起源于CP/M,这是一种用于8080/Z80微型计算机的非常古老的操作系统。它的文件系统没有将文件大小跟踪到字节级别,只跟踪到128字节的扇区级别,因此需要另一种方式来标记文件结束。
Microsoft's DOS was made to be as compatible with CP/M as possible, so it kept the convention when reading text files. By this time the file size was kept by the file system so it wasn't strictly necessary, just retained for backward compatibility.
微软的DOS是为了尽可能地与CP/M兼容,所以它在读取文本文件时保持惯例。此时,文件系统保留了文件大小,因此并不一定要保留文件大小,只是为了向后兼容。
This convention has persisted to the present day in the C and C++ libraries for Windows; when you open a file in text mode, every character is checked for Ctrl-Z and the end-of-file flag is set if it's detected. You're seeing the effects of backwards compatibility taken to an extreme, back to systems that are almost 40 years old.
在Windows的C和c++库中,这种约定一直延续到今天;当您以文本模式打开文件时,将检查每个字符是否按Ctrl-Z,如果检测到文件结束标志,则设置它。你看到的是向后兼容的效果被带到一个极端,回到将近40岁的系统。
#3
0
Found a terrific article that answers all the question! https://latedev.wordpress.com/2012/12/04/all-about-eof/
找到一篇能回答所有问题的好文章!https://latedev.wordpress.com/2012/12/04/all-about-eof/