This thinking comes from a discussion about a practical problem Replacing multiple new lines in a file with just one. Something wrong happened while using a cygwin terminal running on a windows 8.1 machine.
这种想法来自对实际问题的讨论仅用一个替换文件中的多个新行。使用在Windows 8.1机器上运行的cygwin终端时发生了错误。
Since the end-of-line terminator would be different, like \n
, \r
, or \r\n
, is it necessary to write a "portable" if(c=='\n')
to make it work well on Linux, Windows and OS X? Or, the best practise is just to convert the file with commands/tools?
由于行尾终结符不同,如\ n,\ r或\ r \ n,是否需要编写“可移植”if(c =='\ n')以使其工作正常Linux,Windows和OS X?或者,最佳做法是使用命令/工具转换文件?
#include <stdio.h>
int main ()
{
FILE * pFile;
int c;
int n = 0;
pFile=fopen ("myfile.txt","r");
if (pFile==NULL) perror ("Error opening file");
else
{
do {
c = fgetc (pFile);
if (c == '\n') n++; // will it work fine under different platform?
} while (c != EOF);
fclose (pFile);
printf ("The file contains %d lines.\n",n);
}
return 0;
}
Update1:
CRT will always convert line endings into '\n'?
CRT会一直将行结尾转换为'\ n'吗?
1 个解决方案
#1
4
If an input file is opened in binary mode (the character 'b' in the mode string) then it is necessary to worry about the possible presence of '\r'
before '\n'
.
如果输入文件以二进制模式打开(模式字符串中的字符'b'),则必须担心'\ n'之前可能存在'\ _ \'。
If the file is not opened in binary mode (and also not read using binary functions such as fread()
) then it is not necessary to worry about the presence of '\r'
before '\n'
because that will be handled before the input is received by your code - either by a relevant system function (e.g. device driver that reads input from disk, or from stdin
) or by the implementation of the functions you use to read input from the file.
如果文件没有以二进制模式打开(也没有使用二进制函数如fread()读取),那么就不必担心'\ n'之前是否存在'\ _ \',因为这将在您的代码接收输入 - 通过相关的系统函数(例如,从磁盘读取输入或从stdin读取输入的设备驱动程序)或通过实现用于从文件读取输入的函数。
If you are transferring files between systems (e.g. writing the file under linux, and transferring it to a windows system, where a program tries to read it in) then you have options;
如果您要在系统之间传输文件(例如,在linux下编写文件,并将其传输到Windows系统,程序试图将其读入),那么您可以选择;
- write and read the file in non-binary mode, and do a relevant translation of the file when transferring it between systems. If using
ftp
this can be handled by transferring the file using text mode rather than binary mode. If the file is transferred in binary mode, the you will need to run the file throughdos2unix
(if transferring the file to unix) or throughunix2dos
(going the other way). - Do all your I/O in binary mode, transfer them between systems using binary mode, and never read them in non-binary mode. Among other things, this gives you explicit control over what data is in the file.
- Write your file in text mode, transfer the file as you see fit. Then only read in binary mode and, when your reading code encounters a
\r\n
pair, drop the'\r'
character.
以非二进制模式写入和读取文件,并在系统之间传输文件时对文件进行相关翻译。如果使用ftp,可以通过使用文本模式而不是二进制模式传输文件来处理。如果文件以二进制模式传输,则需要通过dos2unix(如果将文件传输到unix)或通过unix2dos(以其他方式)运行文件。
以二进制模式执行所有I / O,使用二进制模式在系统之间传输它们,而不是以非二进制模式读取它们。除此之外,这使您可以明确控制文件中的数据。
以文本模式编写文件,根据需要传输文件。然后只读取二进制模式,当你的阅读代码遇到\ r \ n对时,删除'\ r'字符。
The last is arguably the most robust - the writing code might include \r
before \n
characters, or it might not, but the reading code simply ignores any '\r'
characters that it encounters before a '\n'
character. Such code will probably even cope if the files are edited by hand (e.g. with a text editor - that might be separately configured to either insert or remove \r
and \n
) before being read.
最后一个可以说是最强大的 - 编写代码可能包括\ n字符前面的\ r \ n,或者它可能没有,但是阅读代码只是忽略它在'\ n'字符之前遇到的任何'\ r'字符。如果文件是手工编辑的(例如,使用文本编辑器 - 可能单独配置为插入或删除\ r和\ n),这样的代码甚至可能会应对。
#1
4
If an input file is opened in binary mode (the character 'b' in the mode string) then it is necessary to worry about the possible presence of '\r'
before '\n'
.
如果输入文件以二进制模式打开(模式字符串中的字符'b'),则必须担心'\ n'之前可能存在'\ _ \'。
If the file is not opened in binary mode (and also not read using binary functions such as fread()
) then it is not necessary to worry about the presence of '\r'
before '\n'
because that will be handled before the input is received by your code - either by a relevant system function (e.g. device driver that reads input from disk, or from stdin
) or by the implementation of the functions you use to read input from the file.
如果文件没有以二进制模式打开(也没有使用二进制函数如fread()读取),那么就不必担心'\ n'之前是否存在'\ _ \',因为这将在您的代码接收输入 - 通过相关的系统函数(例如,从磁盘读取输入或从stdin读取输入的设备驱动程序)或通过实现用于从文件读取输入的函数。
If you are transferring files between systems (e.g. writing the file under linux, and transferring it to a windows system, where a program tries to read it in) then you have options;
如果您要在系统之间传输文件(例如,在linux下编写文件,并将其传输到Windows系统,程序试图将其读入),那么您可以选择;
- write and read the file in non-binary mode, and do a relevant translation of the file when transferring it between systems. If using
ftp
this can be handled by transferring the file using text mode rather than binary mode. If the file is transferred in binary mode, the you will need to run the file throughdos2unix
(if transferring the file to unix) or throughunix2dos
(going the other way). - Do all your I/O in binary mode, transfer them between systems using binary mode, and never read them in non-binary mode. Among other things, this gives you explicit control over what data is in the file.
- Write your file in text mode, transfer the file as you see fit. Then only read in binary mode and, when your reading code encounters a
\r\n
pair, drop the'\r'
character.
以非二进制模式写入和读取文件,并在系统之间传输文件时对文件进行相关翻译。如果使用ftp,可以通过使用文本模式而不是二进制模式传输文件来处理。如果文件以二进制模式传输,则需要通过dos2unix(如果将文件传输到unix)或通过unix2dos(以其他方式)运行文件。
以二进制模式执行所有I / O,使用二进制模式在系统之间传输它们,而不是以非二进制模式读取它们。除此之外,这使您可以明确控制文件中的数据。
以文本模式编写文件,根据需要传输文件。然后只读取二进制模式,当你的阅读代码遇到\ r \ n对时,删除'\ r'字符。
The last is arguably the most robust - the writing code might include \r
before \n
characters, or it might not, but the reading code simply ignores any '\r'
characters that it encounters before a '\n'
character. Such code will probably even cope if the files are edited by hand (e.g. with a text editor - that might be separately configured to either insert or remove \r
and \n
) before being read.
最后一个可以说是最强大的 - 编写代码可能包括\ n字符前面的\ r \ n,或者它可能没有,但是阅读代码只是忽略它在'\ n'字符之前遇到的任何'\ r'字符。如果文件是手工编辑的(例如,使用文本编辑器 - 可能单独配置为插入或删除\ r和\ n),这样的代码甚至可能会应对。