C和c++之间的字符串文字差异

时间:2021-06-05 16:38:35

As far as I can tell, before C++11, string literals were handled in almost exactly the same way between C and C++.

据我所知,在c++ 11之前,C和c++之间处理字符串的方式几乎完全相同。

Now, I acknowledge that there are differences between C and C++ in the handling of wide string literals.

现在,我承认C和c++在处理宽字符串常量方面存在差异。

The only differences that I have been able to find are in the initialization of an array by string literal.

我能找到的唯一不同是,在数组的初始化中,字符串文字。

char str[3] = "abc"; /* OK in C but not in C++ */
char str[4] = "abc"; /* OK in C and in C++. Terminating zero at str[3] */

And a technical difference that only matters in C++. In C++ "abc" is const char [4] while in C it is char [4]. However, C++ has a special rule that allows the conversion to const char * and then to char * to retain C compatibility up until C++11 when that special rule is no longer applied.

技术上的差异只在c++中重要。在c++中,abc是const char[4],而在C中是char[4]。但是,c++有一个特殊的规则,允许转换为const char *,然后转换为char *,以保持C的兼容性,直到c++ 11不再适用该特殊规则时为止。

And a difference in allowed lengths of literals. However, as a practical matter any compiler that compiles both C and C++ code will not enforce the lower C limit.

以及允许的文字长度的差异。然而,作为一个实用的问题,任何编译C和c++代码的编译器都不会强制降低C的下限。

I have some interesting links that apply:

我有一些有趣的链接:

Are there any other differences?

还有其他区别吗?

2 个解决方案

#1


8  

Raw strings

A noticeable difference is that C++'s string literals are a superset of C ones. Specifically C++ now supports raw strings (not supported in C), technically defined at §2.14.15 and generally used in HTML and XML where " is often encountered.

一个明显的区别是,c++的字符串文字是C语言的超集。特别是c++现在支持原始字符串(C)不支持,技术上定义§2.14.15,通常用于HTML和XML的地方”是经常遇到的。

Raw strings allow you to specify your own delimiter (up to 16 characters) in the form:

原始字符串允许您在表单中指定自己的分隔符(最多16个字符):

R"delimiter(char sequence)delimiter"

This is particularly useful to avoid unnecessary escaping characters by providing your own string delimiter. The following two examples show how you can avoid escaping of " and ( respectively:

通过提供您自己的字符串分隔符,这对于避免不必要的转义字符特别有用。下面两个例子分别展示了如何避免“和”的转义:

std::cout << R"(a"b"c")";      // empty delimiter
std::cout << '\n';
std::cout << R"aa(a("b"))aa";  // aa delimiter
// a"b"c"
// a("b")

Live demo

现场演示


char vs const char

Another difference, pointed out in the comments, is that string literals have type char [n] in C, as specified at §6.4.5/6:

在评论中指出,另一个区别是,字符串类型char[n]在C语言中,指定在§6.4.5/6:

For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

对于字符串文本,数组元素具有类型char,并使用多字节字符序列的单个字节初始化。

while in C++ they have type const char [n], as defined in §2.14.5/8:

在c++类型const char[n],在定义§2.14.5/8:

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).

普通字串和UTF-8字串也被称为窄字串字串。窄字符串文本具有“n const char数组”类型,其中n是下面定义的字符串大小,具有静态存储时间(3.7)。

This doesn't change the fact that in both standard (at §6.4.5/7 and 2.14.5/13 for C and C++ respectively) attempting to modify a string literal results in undefined behavior.

这并不能改变这个事实,那就是在两个标准(§6.4.5/7和2.14.5/13 C和c++)试图修改字符串文字导致未定义行为。


Unspecified vs Implementation defined (ref)

Another subtle difference is that in C, wether the character arrays of string literals are different is unspecified, as per §6.4.5/7:

另一个微妙的差异是在C语言中,字符串的字符数组是否未指明的是不同的,根据§6.4.5/7:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values.

如果这些数组的元素具有适当的值,则不确定这些数组是否不同。

while in C++ this is implementation defined, as per §2.14.5/13:

在c++实现定义,根据§2.14.5/13:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation- defined.

所有字符串是否都是不同的(即,存储在不重叠的对象中)是实现定义的。

#2


-1  

The best way to answer your question is to rewrite it as a Program that compiles identically when using a "C" or "C++" Compiler, I will assume you are using GCC but other (correctly written) Compiler Toolchains should provide similar results.

回答您的问题的最佳方法是将它重写为使用“C”或“c++”编译器时编译一致的程序,我假设您正在使用GCC,但是其他(正确编写的)编译器工具链应该提供类似的结果。

First I will address each point you posed then I will give a Program that provides the answer (and Proof).

首先我将阐述你提出的每一点,然后我将给出一个提供答案(和证明)的程序。

  • As far as I can tell, before C++11, string literals were handled in almost exactly the same way between C and C++.
  • 据我所知,在c++ 11之前,C和c++之间处理字符串的方式几乎完全相同。

They still can be handled the same way using various Command Line Parameters, in this example I will use "-fpermissive" (a Cheat). You are better off finding out why you are getting Warnings and writing NEW Code to avoid ANY Warning; only use CLP 'cheats' to compile OLD Code.

它们仍然可以使用不同的命令行参数进行相同的处理,在本例中,我将使用“- fperative”(一个欺骗)。最好弄清楚为什么要收到警告,并编写新的代码以避免任何警告;只使用CLP 'cheats'来编译旧代码。

Write new Code correctly (no cheats and no Warnings, that there be no Errors goes without saying).

正确地编写新代码(没有欺骗和警告,没有错误不用说)。

  • Now, I acknowledge that there are differences between C and C++ in the handling of wide string literals.
  • 现在,我承认C和c++在处理宽字符串常量方面存在差异。

There does not have to be (many differences) since you can cheat most or all of them away depending on the circumstances. Cheating is wrong, learn to program correctly and follow modern Standards not the mistakes (or awkwardness) of the past. Things are done a certain way to be helpful both to you, and to the Compiler in some cases (remember YOU are not the only one who 'sees' your Code).

不需要有(很多区别),因为你可以根据不同的环境欺骗大部分或全部。作弊是错误的,学会正确编程,遵循现代标准,而不是过去的错误(或尴尬)。在某些情况下,对您和编译器都有一定的帮助(记住,您不是唯一一个“看到”您的代码的人)。

In this case the Compiler wants enough space allocated to terminate the String with a '0' (zero byte). That permits the use of a print (and some other) Function without specifying the length of the String.

在这种情况下,编译器需要分配足够的空间来终止带有'0'(零字节)的字符串。这允许使用print(和其他一些)函数,而无需指定字符串的长度。

IF you are simply trying to compile an existing Program you obtained from somewhere and do not want to re-write it, you simply want to compile it and run it, then use the cheats (if you must) to get past the Warnings and force the compilation to an executable.

如果您只是试图编译从某个地方获得的现有程序,并且不想重新编写它,那么您只需编译并运行它,然后使用欺骗(如果必须的话)来通过警告并强制编译到可执行文件。

  • The rest of what you wrote ...
  • 剩下的你写的……

No.

不。

.

See this example Program. I slightly modified your question to make it into a Program. The result of compiling this Program with a "C" or C++" Compiler is identical.

看到这个示例程序。我稍微修改了一下你的问题,把它变成了一个程序。使用“C”或“c++”编译器编译此程序的结果是相同的。

Copy-and-Paste the example Program text below to a File called "test.c", then follow the instructions at the start. Simply 'cat' the File so you can backscroll it (and see it) without opening a Text Editor, then Copy-and-Paste each Line beginning with the Compiler Commands (the next three).

复制并粘贴下面的示例程序文本到一个名为“test”的文件中。c",然后按照说明开始。只需“cat”这个文件,就可以在不打开文本编辑器的情况下对它进行反向滚动(并查看它),然后复制粘贴以编译器命令开头的每一行(接下来的三行)。

Note, that as pointed out in the Comments, that running this Line "g++ -S -o test_c++.s test.c" produces an Error (using a modern g++ Compiler) since the container is not long enough to hold the String.

注意,正如注释中所指出的,运行这一行“g+ -S -o test_c++ +”。年代测试。c“产生一个错误(使用现代的g+编译器),因为容器不够长,不能容纳字符串。

You should be able to read this Program and not actually need to compile it to see the Answer but it will compile and produce the Output for you to examine should you desire to do so.

您应该能够阅读这个程序,而不需要编译它来查看答案,但是它将编译并生成输出,以便您检查是否需要这样做。

As you can see the Varable "str1" is not long enough to hold the String when it is null terminated, that produces an Error on a modern (and correctly written) g++ Compiler.

如您所见,可变的“str1”不够长,不能在字符串以null结尾时保存该字符串,这在现代的(正确编写的)g+编译器上产生错误。


/* Answer for: http://*.com/questions/23145793/string-literal-differences-between-c-and-c
 *
 * cat test.c
 * gcc -S -o test_c.s test.c
 * g++ -S -o test_c++.s test.c
 * g++ -S -fpermissive -o test_c++.s test.c
 *
 */

char str1[3] = "1ab";
char str2[4] = "2ab";
char str3[]  = "3ab";

main(){return 0;}


/* Comment: Executing "g++ -S -o test_c++.s test.c" produces this Error:
 *
 * test.c:10:16: error: initializer-string for array of chars is too long [-fpermissive]
 * char str1[3] = "1ab";
 *                ^
 *
 */


/* Resulting Assembly Language Output */

/*      .file   "test.c"
 *      .globl  _str1
 *      .data
 * _str1:
 *      .ascii "1ab"
 *      .globl  _str2
 * _str2:
 *      .ascii "2ab\0"
 *      .globl  _str3
 * _str3:
 *      .ascii "3ab\0"
 *      .def    ___main;    .scl    2;  .type   32; .endef
 *      .text
 *      .globl  _main
 *      .def    _main;  .scl    2;  .type   32; .endef
 * _main:
 * LFB0:
 *      .cfi_startproc
 *      pushl   %ebp
 *      .cfi_def_cfa_offset 8
 *      .cfi_offset 5, -8
 *      movl    %esp, %ebp
 *      .cfi_def_cfa_register 5
 *      andl    $-16, %esp
 *      call    ___main
 *      movl    $0, %eax
 *      leave
 *      .cfi_restore 5
 *      .cfi_def_cfa 4, 4
 *      ret
 *      .cfi_endproc
 * LFE0:
 *      .ident  "GCC: (GNU) 4.8.2"
 *
 */

#1


8  

Raw strings

A noticeable difference is that C++'s string literals are a superset of C ones. Specifically C++ now supports raw strings (not supported in C), technically defined at §2.14.15 and generally used in HTML and XML where " is often encountered.

一个明显的区别是,c++的字符串文字是C语言的超集。特别是c++现在支持原始字符串(C)不支持,技术上定义§2.14.15,通常用于HTML和XML的地方”是经常遇到的。

Raw strings allow you to specify your own delimiter (up to 16 characters) in the form:

原始字符串允许您在表单中指定自己的分隔符(最多16个字符):

R"delimiter(char sequence)delimiter"

This is particularly useful to avoid unnecessary escaping characters by providing your own string delimiter. The following two examples show how you can avoid escaping of " and ( respectively:

通过提供您自己的字符串分隔符,这对于避免不必要的转义字符特别有用。下面两个例子分别展示了如何避免“和”的转义:

std::cout << R"(a"b"c")";      // empty delimiter
std::cout << '\n';
std::cout << R"aa(a("b"))aa";  // aa delimiter
// a"b"c"
// a("b")

Live demo

现场演示


char vs const char

Another difference, pointed out in the comments, is that string literals have type char [n] in C, as specified at §6.4.5/6:

在评论中指出,另一个区别是,字符串类型char[n]在C语言中,指定在§6.4.5/6:

For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

对于字符串文本,数组元素具有类型char,并使用多字节字符序列的单个字节初始化。

while in C++ they have type const char [n], as defined in §2.14.5/8:

在c++类型const char[n],在定义§2.14.5/8:

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).

普通字串和UTF-8字串也被称为窄字串字串。窄字符串文本具有“n const char数组”类型,其中n是下面定义的字符串大小,具有静态存储时间(3.7)。

This doesn't change the fact that in both standard (at §6.4.5/7 and 2.14.5/13 for C and C++ respectively) attempting to modify a string literal results in undefined behavior.

这并不能改变这个事实,那就是在两个标准(§6.4.5/7和2.14.5/13 C和c++)试图修改字符串文字导致未定义行为。


Unspecified vs Implementation defined (ref)

Another subtle difference is that in C, wether the character arrays of string literals are different is unspecified, as per §6.4.5/7:

另一个微妙的差异是在C语言中,字符串的字符数组是否未指明的是不同的,根据§6.4.5/7:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values.

如果这些数组的元素具有适当的值,则不确定这些数组是否不同。

while in C++ this is implementation defined, as per §2.14.5/13:

在c++实现定义,根据§2.14.5/13:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation- defined.

所有字符串是否都是不同的(即,存储在不重叠的对象中)是实现定义的。

#2


-1  

The best way to answer your question is to rewrite it as a Program that compiles identically when using a "C" or "C++" Compiler, I will assume you are using GCC but other (correctly written) Compiler Toolchains should provide similar results.

回答您的问题的最佳方法是将它重写为使用“C”或“c++”编译器时编译一致的程序,我假设您正在使用GCC,但是其他(正确编写的)编译器工具链应该提供类似的结果。

First I will address each point you posed then I will give a Program that provides the answer (and Proof).

首先我将阐述你提出的每一点,然后我将给出一个提供答案(和证明)的程序。

  • As far as I can tell, before C++11, string literals were handled in almost exactly the same way between C and C++.
  • 据我所知,在c++ 11之前,C和c++之间处理字符串的方式几乎完全相同。

They still can be handled the same way using various Command Line Parameters, in this example I will use "-fpermissive" (a Cheat). You are better off finding out why you are getting Warnings and writing NEW Code to avoid ANY Warning; only use CLP 'cheats' to compile OLD Code.

它们仍然可以使用不同的命令行参数进行相同的处理,在本例中,我将使用“- fperative”(一个欺骗)。最好弄清楚为什么要收到警告,并编写新的代码以避免任何警告;只使用CLP 'cheats'来编译旧代码。

Write new Code correctly (no cheats and no Warnings, that there be no Errors goes without saying).

正确地编写新代码(没有欺骗和警告,没有错误不用说)。

  • Now, I acknowledge that there are differences between C and C++ in the handling of wide string literals.
  • 现在,我承认C和c++在处理宽字符串常量方面存在差异。

There does not have to be (many differences) since you can cheat most or all of them away depending on the circumstances. Cheating is wrong, learn to program correctly and follow modern Standards not the mistakes (or awkwardness) of the past. Things are done a certain way to be helpful both to you, and to the Compiler in some cases (remember YOU are not the only one who 'sees' your Code).

不需要有(很多区别),因为你可以根据不同的环境欺骗大部分或全部。作弊是错误的,学会正确编程,遵循现代标准,而不是过去的错误(或尴尬)。在某些情况下,对您和编译器都有一定的帮助(记住,您不是唯一一个“看到”您的代码的人)。

In this case the Compiler wants enough space allocated to terminate the String with a '0' (zero byte). That permits the use of a print (and some other) Function without specifying the length of the String.

在这种情况下,编译器需要分配足够的空间来终止带有'0'(零字节)的字符串。这允许使用print(和其他一些)函数,而无需指定字符串的长度。

IF you are simply trying to compile an existing Program you obtained from somewhere and do not want to re-write it, you simply want to compile it and run it, then use the cheats (if you must) to get past the Warnings and force the compilation to an executable.

如果您只是试图编译从某个地方获得的现有程序,并且不想重新编写它,那么您只需编译并运行它,然后使用欺骗(如果必须的话)来通过警告并强制编译到可执行文件。

  • The rest of what you wrote ...
  • 剩下的你写的……

No.

不。

.

See this example Program. I slightly modified your question to make it into a Program. The result of compiling this Program with a "C" or C++" Compiler is identical.

看到这个示例程序。我稍微修改了一下你的问题,把它变成了一个程序。使用“C”或“c++”编译器编译此程序的结果是相同的。

Copy-and-Paste the example Program text below to a File called "test.c", then follow the instructions at the start. Simply 'cat' the File so you can backscroll it (and see it) without opening a Text Editor, then Copy-and-Paste each Line beginning with the Compiler Commands (the next three).

复制并粘贴下面的示例程序文本到一个名为“test”的文件中。c",然后按照说明开始。只需“cat”这个文件,就可以在不打开文本编辑器的情况下对它进行反向滚动(并查看它),然后复制粘贴以编译器命令开头的每一行(接下来的三行)。

Note, that as pointed out in the Comments, that running this Line "g++ -S -o test_c++.s test.c" produces an Error (using a modern g++ Compiler) since the container is not long enough to hold the String.

注意,正如注释中所指出的,运行这一行“g+ -S -o test_c++ +”。年代测试。c“产生一个错误(使用现代的g+编译器),因为容器不够长,不能容纳字符串。

You should be able to read this Program and not actually need to compile it to see the Answer but it will compile and produce the Output for you to examine should you desire to do so.

您应该能够阅读这个程序,而不需要编译它来查看答案,但是它将编译并生成输出,以便您检查是否需要这样做。

As you can see the Varable "str1" is not long enough to hold the String when it is null terminated, that produces an Error on a modern (and correctly written) g++ Compiler.

如您所见,可变的“str1”不够长,不能在字符串以null结尾时保存该字符串,这在现代的(正确编写的)g+编译器上产生错误。


/* Answer for: http://*.com/questions/23145793/string-literal-differences-between-c-and-c
 *
 * cat test.c
 * gcc -S -o test_c.s test.c
 * g++ -S -o test_c++.s test.c
 * g++ -S -fpermissive -o test_c++.s test.c
 *
 */

char str1[3] = "1ab";
char str2[4] = "2ab";
char str3[]  = "3ab";

main(){return 0;}


/* Comment: Executing "g++ -S -o test_c++.s test.c" produces this Error:
 *
 * test.c:10:16: error: initializer-string for array of chars is too long [-fpermissive]
 * char str1[3] = "1ab";
 *                ^
 *
 */


/* Resulting Assembly Language Output */

/*      .file   "test.c"
 *      .globl  _str1
 *      .data
 * _str1:
 *      .ascii "1ab"
 *      .globl  _str2
 * _str2:
 *      .ascii "2ab\0"
 *      .globl  _str3
 * _str3:
 *      .ascii "3ab\0"
 *      .def    ___main;    .scl    2;  .type   32; .endef
 *      .text
 *      .globl  _main
 *      .def    _main;  .scl    2;  .type   32; .endef
 * _main:
 * LFB0:
 *      .cfi_startproc
 *      pushl   %ebp
 *      .cfi_def_cfa_offset 8
 *      .cfi_offset 5, -8
 *      movl    %esp, %ebp
 *      .cfi_def_cfa_register 5
 *      andl    $-16, %esp
 *      call    ___main
 *      movl    $0, %eax
 *      leave
 *      .cfi_restore 5
 *      .cfi_def_cfa 4, 4
 *      ret
 *      .cfi_endproc
 * LFE0:
 *      .ident  "GCC: (GNU) 4.8.2"
 *
 */