c++字符串字符转义字符的规则

时间:2021-05-19 09:37:16

What are the rules for the escape character \ in string literals? Is there a list of all the characters that are escaped?

在字符串中转义字符\的规则是什么?是否有所有转义字符的列表?

In particular, when I use \ in a string literal in gedit, and follow it by any three numbers, it colors them differently.

特别地,当我在gedit中使用一个字符串字面量,并跟随它的任意三个数字时,它会用不同的颜色来表示它们。

I was trying to create a std::string constructed from a literal with the character 0 followed by the null character (\0), followed by the character 0. However, the syntax highlighting alerted me that maybe this would create something like the character 0 followed by the null character (\00, aka \0), which is to say, only two characters.

我试着创建一个std::字符串由文字与字符0组成,后跟null字符(\0),后跟字符0。然而,突出显示的语法提醒我,这可能会创建类似字符0后面跟着空字符(\00,aka \0)的东西,也就是说,只有两个字符。

For the solution to just this one problem, is this the best way to do it:

对于这个问题的解决方法,这是最好的方法吗?

std::string ("0\0" "0", 3)  // String concatenation 

And is there some reference for what the escape character does in string literals in general? What is '\a', for instance?

总的来说,对于转义字符在字符串中的作用有什么参考吗?例如,什么是“\a”?

5 个解决方案

#1


48  

Control characters:

控制字符:

(Hex codes assume an ASCII-compatible character encoding.)

(Hex代码假设一个与ascii兼容的字符编码。)

  • \a = \x07 = alert (bell)
  • \a = \x07 =警告(铃声)
  • \b = \x08 = backspace
  • \b = \x08 = backspace
  • \t = \x09 = horizonal tab
  • \t = \x09 =横档
  • \n = \x0A = newline (or line feed)
  • \n = \x0A =换行(或换行)
  • \v = \x0B = vertical tab
  • \v = \x0B =竖屏
  • \f = \x0C = form feed
  • \f = \x0C = form feed
  • \r = \x0D = carriage return
  • \r = \x0D =回车
  • \e = \x1B = escape (non-standard GCC extension)
  • \e = \x1B = escape(非标准GCC扩展)

Punctuation characters:

标点符号:

  • \" = quotation mark (backslash not required for '"')
  • \" =引号("' '不需要反斜杠)
  • \' = apostrophe (backslash not required for "'")
  • \' =撇号(“”不需要反斜杠)
  • \? = question mark (used to avoid trigraphs)
  • \ ?=问号(用于避免三图)
  • \\ = backslash
  • \ \ =反斜杠

Numeric character references:

数字字符引用:

  • \ + up to 3 octal digits
  • 最多3个八进制数字
  • \x + any number of hex digits
  • \x +任意数量的十六进制数字
  • \u + 4 hex digits (Unicode BMP, new in C++11)
  • \u + 4十六进制数字(Unicode BMP,新的c++ 11)
  • \U + 8 hex digits (Unicode astral planes, new in C++11)
  • \U + 8十六进制数字(Unicode星形平面,新的c++ 11)

\0 = \00 = \000 = octal ecape for null character

\0 = \00 = \000 = null字符的八进制ecape

If you do want an actual digit character after a \0, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0".

如果您确实想要一个真正的数字字符后的\0,那么是的,我建议字符串连接。注意,文字部分之间的空格是可选的,所以您可以写“\0”“0”。

#2


4  

\0 will be interpreted as an octal escape sequence if it is followed by other digits, so \00 will be interpreted as a single character. (\0 is technically an octal escape sequence as well, at least in C).

\0将被解释为八进制转义序列,如果后面跟着其他数字,那么\00将被解释为单个字符。(\0在技术上也是一个八进制转义序列,至少在C中是这样)。

The way you're doing it:

你做事的方式:

std::string ("0\0" "0", 3)  // String concatenation 

works because this version of the constructor takes a char array; if you try to just pass "0\0" "0" as a const char*, it will treat it as a C string and only copy everything up until the null character.

因为构造函数的这个版本接受一个char数组;如果您尝试将“0\0”“0”作为const char*传递,它将把它当作C字符串,并且只将所有内容复制到null字符。

Here is a list of escape sequences.

这是转义序列的列表。

#3


4  

\a is the bell/alert character, which on some systems triggers a sound. \nnn, represents an arbitrary ASCII character in octal base. However, \0 is special in that it represents the null character no matter what.

\a是铃声/警报字符,在某些系统上可以触发声音。\nnn,表示八进制基中的任意ASCII字符。然而,\0是特殊的,因为它表示空字符,无论如何。

To answer your original question, you could escape your '0' characters as well, as:

要回答你最初的问题,你也可以将“0”字符转义为:

std::string ("\060\000\060", 3);

(since an ASCII '0' is 60 in octal)

(因为ASCII 0是60个八进制)

The MSDN documentation has a pretty detailed article on this, as well cppreference

MSDN文档中有一篇关于这方面的非常详细的文章,还有cppreference

#4


1  

I left something like this as a comment, but I feel it probably needs more visibility as none of the answers mention this method:

我留下这样的东西作为评论,但我觉得它可能需要更多的可见性,因为没有一个答案提到这个方法:

The method I now prefer for initializing a std::string with non-printing characters in general (and embedded null characters in particular) is to use the C++11 feature of initializer lists.

我现在更喜欢的初始化std::字符串的方法是使用初始化列表的c++ 11特性。

std::string const str({'\0', '6', '\a', 'H', '\t'});

I am not required to perform error-prone manual counting of the number of characters that I am using, so that if later on I want to insert a '\013' in the middle somewhere, I can and all of my code will still work. It also completely sidesteps any issues of using the wrong escape sequence by accident.

我不需要对我正在使用的字符的数量进行错误的手工计数,所以如果以后我想在中间插入一个“\013”,我可以而且我的所有代码仍然可以工作。它还完全回避了偶然使用错误转义序列的任何问题。

The only downside is all of those extra ' and , characters.

唯一的缺点是所有这些额外的“和”字符。

#5


0  

With the magic of user-defined literals, we have yet another solution to this. C++14 added a std::string literal operator.

有了用户定义文字的魔力,我们有了另一个解决方案。c++ 14添加了一个std::字符串文字运算符。

using namespace std::string_literals;
auto const x = "\0" "0"s;

Constructs a string of length 2, with a '\0' character (null) followed by a '0' character (the digit zero). I am not sure if it is more or less clear than the initializer_list<char> constructor approach, but it at least gets rid of the ' and , characters.

构造一个长度为2的字符串,后跟一个'\0'字符(null)和一个'0'字符(数字0)。我不确定它是否比initializer_list 构造函数方法更清楚,但至少可以去掉“和”字符。

#1


48  

Control characters:

控制字符:

(Hex codes assume an ASCII-compatible character encoding.)

(Hex代码假设一个与ascii兼容的字符编码。)

  • \a = \x07 = alert (bell)
  • \a = \x07 =警告(铃声)
  • \b = \x08 = backspace
  • \b = \x08 = backspace
  • \t = \x09 = horizonal tab
  • \t = \x09 =横档
  • \n = \x0A = newline (or line feed)
  • \n = \x0A =换行(或换行)
  • \v = \x0B = vertical tab
  • \v = \x0B =竖屏
  • \f = \x0C = form feed
  • \f = \x0C = form feed
  • \r = \x0D = carriage return
  • \r = \x0D =回车
  • \e = \x1B = escape (non-standard GCC extension)
  • \e = \x1B = escape(非标准GCC扩展)

Punctuation characters:

标点符号:

  • \" = quotation mark (backslash not required for '"')
  • \" =引号("' '不需要反斜杠)
  • \' = apostrophe (backslash not required for "'")
  • \' =撇号(“”不需要反斜杠)
  • \? = question mark (used to avoid trigraphs)
  • \ ?=问号(用于避免三图)
  • \\ = backslash
  • \ \ =反斜杠

Numeric character references:

数字字符引用:

  • \ + up to 3 octal digits
  • 最多3个八进制数字
  • \x + any number of hex digits
  • \x +任意数量的十六进制数字
  • \u + 4 hex digits (Unicode BMP, new in C++11)
  • \u + 4十六进制数字(Unicode BMP,新的c++ 11)
  • \U + 8 hex digits (Unicode astral planes, new in C++11)
  • \U + 8十六进制数字(Unicode星形平面,新的c++ 11)

\0 = \00 = \000 = octal ecape for null character

\0 = \00 = \000 = null字符的八进制ecape

If you do want an actual digit character after a \0, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0".

如果您确实想要一个真正的数字字符后的\0,那么是的,我建议字符串连接。注意,文字部分之间的空格是可选的,所以您可以写“\0”“0”。

#2


4  

\0 will be interpreted as an octal escape sequence if it is followed by other digits, so \00 will be interpreted as a single character. (\0 is technically an octal escape sequence as well, at least in C).

\0将被解释为八进制转义序列,如果后面跟着其他数字,那么\00将被解释为单个字符。(\0在技术上也是一个八进制转义序列,至少在C中是这样)。

The way you're doing it:

你做事的方式:

std::string ("0\0" "0", 3)  // String concatenation 

works because this version of the constructor takes a char array; if you try to just pass "0\0" "0" as a const char*, it will treat it as a C string and only copy everything up until the null character.

因为构造函数的这个版本接受一个char数组;如果您尝试将“0\0”“0”作为const char*传递,它将把它当作C字符串,并且只将所有内容复制到null字符。

Here is a list of escape sequences.

这是转义序列的列表。

#3


4  

\a is the bell/alert character, which on some systems triggers a sound. \nnn, represents an arbitrary ASCII character in octal base. However, \0 is special in that it represents the null character no matter what.

\a是铃声/警报字符,在某些系统上可以触发声音。\nnn,表示八进制基中的任意ASCII字符。然而,\0是特殊的,因为它表示空字符,无论如何。

To answer your original question, you could escape your '0' characters as well, as:

要回答你最初的问题,你也可以将“0”字符转义为:

std::string ("\060\000\060", 3);

(since an ASCII '0' is 60 in octal)

(因为ASCII 0是60个八进制)

The MSDN documentation has a pretty detailed article on this, as well cppreference

MSDN文档中有一篇关于这方面的非常详细的文章,还有cppreference

#4


1  

I left something like this as a comment, but I feel it probably needs more visibility as none of the answers mention this method:

我留下这样的东西作为评论,但我觉得它可能需要更多的可见性,因为没有一个答案提到这个方法:

The method I now prefer for initializing a std::string with non-printing characters in general (and embedded null characters in particular) is to use the C++11 feature of initializer lists.

我现在更喜欢的初始化std::字符串的方法是使用初始化列表的c++ 11特性。

std::string const str({'\0', '6', '\a', 'H', '\t'});

I am not required to perform error-prone manual counting of the number of characters that I am using, so that if later on I want to insert a '\013' in the middle somewhere, I can and all of my code will still work. It also completely sidesteps any issues of using the wrong escape sequence by accident.

我不需要对我正在使用的字符的数量进行错误的手工计数,所以如果以后我想在中间插入一个“\013”,我可以而且我的所有代码仍然可以工作。它还完全回避了偶然使用错误转义序列的任何问题。

The only downside is all of those extra ' and , characters.

唯一的缺点是所有这些额外的“和”字符。

#5


0  

With the magic of user-defined literals, we have yet another solution to this. C++14 added a std::string literal operator.

有了用户定义文字的魔力,我们有了另一个解决方案。c++ 14添加了一个std::字符串文字运算符。

using namespace std::string_literals;
auto const x = "\0" "0"s;

Constructs a string of length 2, with a '\0' character (null) followed by a '0' character (the digit zero). I am not sure if it is more or less clear than the initializer_list<char> constructor approach, but it at least gets rid of the ' and , characters.

构造一个长度为2的字符串,后跟一个'\0'字符(null)和一个'0'字符(数字0)。我不确定它是否比initializer_list 构造函数方法更清楚,但至少可以去掉“和”字符。