在C和c++中是否强制转义制表符字符?

时间:2022-08-21 23:55:09

In C and C++ (and several other languages) horizontal tabulators (ASCII code 9) in character and string constants are denoted in escaped form as '\t' and "\t". However, I am regularly typing the unescaped tabulator character in string literals as for example in "A B" (there is a TAB in betreen A and B), and at least clang++ does not seem to bother - the string seems to be equivalent to "A\tB". I like the unescaped version better since long indented multi-line strings are better readable in the source code.

在C和c++(以及其他几种语言)中,字符和字符串常量中的水平制表符(ASCII码9)以转义形式表示为'\t'和'\t'。然而,我经常在字符串文字中输入未转义的制表符字符,例如在“AB”中(在betreen A和B中有一个选项卡),而且至少clang++ +看起来不太麻烦——字符串似乎等同于“A\tB”。我更喜欢这个未转义的版本,因为在源代码中,长缩的多行字符串更容易读懂。

Now I am asking myself whether this is generally legal in C and C++ or just supported by my compiler. How portable are unescaped tabulators in character and string constants?

现在我在问自己,在C和c++中,这是否通常是合法的,或者仅仅是我的编译器支持的。字符和字符串常量中的未转义制表符的可移植性如何?

Surprisingly I could not find an answer to this seemingly simple question, neither with Google nor on * (I just found this vaguely related question).

奇怪的是,我找不到这个看似简单的问题的答案,无论是谷歌还是*(我刚刚找到了这个模糊相关的问题)。

4 个解决方案

#1


56  

Yes, you can include a tab character in a string or character literal, at least according to C++11. The allowed characters include (with my emphasis):

是的,您可以在字符串或字符文本中包含一个制表符字符,至少根据c++ 11。允许的字符包括(我强调一下):

any member of the source character set except the double-quote ", backslash \, or new-line character

源字符集的任何成员,除了双引号“、反斜杠\或新行字符

(from C++11 standard, annex A.2)

(C++11标准,附件A.2)

and the source character set includes:

源字符集包括:

the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters

空格字符、表示水平制表符、垂直制表符、表单提要和换行符的控制字符,加上以下91个图形字符

(from C++11 standard, paragraph 2.3.1)

(来自c++ 11标准,第2.3.1段)

UPDATE: I've just noticed that you're asking about two different languages. For C99, the answer is also yes. The wording is different, but basically says the same thing:

更新:我刚刚注意到你问的是两种不同的语言。对于C99来说,答案也是肯定的。措辞不同,但基本上是一样的:

In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or [...]

在字符常量或字符串文字中,执行字符集的成员应该由源字符集或[…]的对应成员表示。

where both the source and execution character sets include

源和执行字符集都包含在哪里

control characters representing horizontal tab, vertical tab, and form feed.

控制字符表示水平选项卡、垂直选项卡和表单提要。

#2


27  

It's completely legal to put a tab character directly into a character string or character literal. The C and C++ standards require the source character set to include a tab character, and string and character literals may contain any character in the source character set except backslash, quote or apostrophe (as appropriate) and newline.

将制表符字符直接放入字符串或字符文本中是完全合法的。C和c++标准要求源字符集包含一个制表符,字符串和字符文字可以包含源字符集中的任何字符,除了反斜杠、引号或撇号(酌情)和换行符。

So it's portable. But it is not a good idea, since there is no way a reader can distinguish between different kinds of whitespace. It is also quite common for text editors, mail programs, and the like to reformat tabs, so bugs may be introduced into the program in the course of such operations.

所以它是便于携带。但这并不是一个好主意,因为读者无法区分不同类型的空白。对于文本编辑器、邮件程序和类似于重新格式化选项卡的方式来说,它也是非常常见的,因此在这种操作过程中,程序中可能会引入bug。

#3


9  

If you enter a tab into an input, then your string will contain a literal tab character, and it will stay a tab character - it wont' be magically translated into \t internally.

如果你在输入中输入一个制表符,那么你的字符串将包含一个文字制表符字符,它将保持一个制表符——它不会被神奇地在内部翻译成\t。

Same goes for writing code - you can embed literal tab characters in your strings. However, consider this:

编写代码也是如此——您可以在字符串中嵌入文字标签字符。然而,考虑一下:

     T     T     T        <--tab stops
012345012345012345012345
foo1 = 'a\tb';
foo2 = 'a  b'; // pressed tab in the editor
foo3 = 'a  b'; // hit space twice in the editor

Unless you put the cursor on the whitespace between a and b and checked how many characters are in there, there is essentially NO way to determine if there's a tab or actual space characters in there. But with the \t version, it is immediately shown to be a tab.

除非您将光标放在a和b之间的空格上,并检查其中有多少字符,否则基本上无法确定其中是否有制表符或实际的空格字符。但是使用\t版本,它会立即显示为一个标签。

#4


2  

When you press the TAB key you get whatever code point your system maps that key to. That code point may or may not be a tab on the system where the program runs. When you put \t in a literal the compiler replaces it with the appropriate code point for the target system. So if you want to be sure that you get a tab on the system where the program runs, use \t. That's its job.

当你按TAB键时,你会得到你的系统映射的任何代码点。该代码点可能是也可能不是程序运行的系统上的一个选项卡。当您将\t放入文字时,编译器会用相应的代码点替换目标系统。因此,如果您想要确保在程序运行的系统上获得一个选项卡,请使用\t。这是它的工作。

#1


56  

Yes, you can include a tab character in a string or character literal, at least according to C++11. The allowed characters include (with my emphasis):

是的,您可以在字符串或字符文本中包含一个制表符字符,至少根据c++ 11。允许的字符包括(我强调一下):

any member of the source character set except the double-quote ", backslash \, or new-line character

源字符集的任何成员,除了双引号“、反斜杠\或新行字符

(from C++11 standard, annex A.2)

(C++11标准,附件A.2)

and the source character set includes:

源字符集包括:

the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters

空格字符、表示水平制表符、垂直制表符、表单提要和换行符的控制字符,加上以下91个图形字符

(from C++11 standard, paragraph 2.3.1)

(来自c++ 11标准,第2.3.1段)

UPDATE: I've just noticed that you're asking about two different languages. For C99, the answer is also yes. The wording is different, but basically says the same thing:

更新:我刚刚注意到你问的是两种不同的语言。对于C99来说,答案也是肯定的。措辞不同,但基本上是一样的:

In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or [...]

在字符常量或字符串文字中,执行字符集的成员应该由源字符集或[…]的对应成员表示。

where both the source and execution character sets include

源和执行字符集都包含在哪里

control characters representing horizontal tab, vertical tab, and form feed.

控制字符表示水平选项卡、垂直选项卡和表单提要。

#2


27  

It's completely legal to put a tab character directly into a character string or character literal. The C and C++ standards require the source character set to include a tab character, and string and character literals may contain any character in the source character set except backslash, quote or apostrophe (as appropriate) and newline.

将制表符字符直接放入字符串或字符文本中是完全合法的。C和c++标准要求源字符集包含一个制表符,字符串和字符文字可以包含源字符集中的任何字符,除了反斜杠、引号或撇号(酌情)和换行符。

So it's portable. But it is not a good idea, since there is no way a reader can distinguish between different kinds of whitespace. It is also quite common for text editors, mail programs, and the like to reformat tabs, so bugs may be introduced into the program in the course of such operations.

所以它是便于携带。但这并不是一个好主意,因为读者无法区分不同类型的空白。对于文本编辑器、邮件程序和类似于重新格式化选项卡的方式来说,它也是非常常见的,因此在这种操作过程中,程序中可能会引入bug。

#3


9  

If you enter a tab into an input, then your string will contain a literal tab character, and it will stay a tab character - it wont' be magically translated into \t internally.

如果你在输入中输入一个制表符,那么你的字符串将包含一个文字制表符字符,它将保持一个制表符——它不会被神奇地在内部翻译成\t。

Same goes for writing code - you can embed literal tab characters in your strings. However, consider this:

编写代码也是如此——您可以在字符串中嵌入文字标签字符。然而,考虑一下:

     T     T     T        <--tab stops
012345012345012345012345
foo1 = 'a\tb';
foo2 = 'a  b'; // pressed tab in the editor
foo3 = 'a  b'; // hit space twice in the editor

Unless you put the cursor on the whitespace between a and b and checked how many characters are in there, there is essentially NO way to determine if there's a tab or actual space characters in there. But with the \t version, it is immediately shown to be a tab.

除非您将光标放在a和b之间的空格上,并检查其中有多少字符,否则基本上无法确定其中是否有制表符或实际的空格字符。但是使用\t版本,它会立即显示为一个标签。

#4


2  

When you press the TAB key you get whatever code point your system maps that key to. That code point may or may not be a tab on the system where the program runs. When you put \t in a literal the compiler replaces it with the appropriate code point for the target system. So if you want to be sure that you get a tab on the system where the program runs, use \t. That's its job.

当你按TAB键时,你会得到你的系统映射的任何代码点。该代码点可能是也可能不是程序运行的系统上的一个选项卡。当您将\t放入文字时,编译器会用相应的代码点替换目标系统。因此,如果您想要确保在程序运行的系统上获得一个选项卡,请使用\t。这是它的工作。