I just don't understand and can't find much info about wchar end.
我就是不明白,也找不到很多关于wchar end的信息。
If it ends with single null byte, how it know it not string end yet, if something like that "009A" represent one of unicode symbols?
如果它以一个空字节结束,它怎么知道它不是字符串结束,如果像“009A”这样的东西表示unicode符号之一呢?
If it ends with two null bytes? Well, I am not sure about it, need confirmation.
如果它以两个空字节结束?嗯,我不确定,需要确认。
4 个解决方案
#1
6
Since a wide string is an array of wide characters, it couldn't even end in an one-byte NUL. It is a two-byte NUL. (Arrays in C/C++ can only hold members of the same type, so of the same size).
因为宽字符串是一个宽字符数组,它甚至不能以一个字节的NUL结束。它是一个2字节的NUL。(C/ c++中的数组只能容纳相同类型、相同大小的成员)。
Also, for ASCII standard characters, there always is one or three one-byte 0, as only extended characters start by a non-zero first byte (depending on whether wchar_t
is 16 or 32 bit wide - for simplicity, I assume 16-bit and little-endian):
同样,对于ASCII标准字符,总是有一个或三个1字节的0,因为只有扩展字符以非0的第一个字节开头(取决于wchar_t是16位还是32位宽——为了简单起见,我假设是16位和little-endian):
HELLO is 72 00 69 00 76 00 76 00 79 00 00 00
#2
5
In C (quoting the N1570 draft, section 7.1.1):
C(引用N1570草稿,第7.1.1节):
A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character.
宽字符串是由第一个空宽字符终止并包含第一个空宽字符的连续宽字符序列。
where a "wide character" is a value of type wchar_t
, which is defined in <stddef.h>
as an integer type.
其中“宽字符”是wchar_t类型的值,在
I can't find a definition of "wide string" in the N3337 draft of the C++ standard, but it should be similar. One minor difference is that wchar_t
is a typedef in C, and a built-in type (whose name is a keyword) in C++. But since C++ shares most of the C library, including functions that act on wide strings, it's safe to assume that the C and C++ definitions are compatible. (If someone can find something more concrete in the C++ standard, please comment or edit this paragraph.)
在c++标准的N3337草案中,我找不到“宽字符串”的定义,但它应该是类似的。一个小的区别是,wchar_t在C中是一个typedef,在c++中是一个内置类型(其名称是一个关键字)。但是由于c++共享了大部分C库,包括作用于宽字符串的函数,所以可以安全地假设C和c++定义是兼容的。(如果有人能在c++标准中找到更具体的内容,请评论或编辑这一段。)
In both C and C++, the size of a wchar_t
is implementation-defined. It's typically either 2 or 4 bytes (16 or 32 bits, unless you're on a very exotic system with bytes bigger than 8 bits). A wide string is a sequence of wide characters (wchar_t
values), terminated by a null wide character. The terminating wide character will have the same size as any other wide character, typically either 2 or 4 bytes.
在C和c++中,wchar_t的大小都是由实现定义的。它通常是2或4个字节(16或32位,除非你在一个非常奇特的系统中,字节大于8比特)。宽字符串是宽字符(wchar_t值)的序列,以空宽字符结束。终端宽字符将具有与任何其他宽字符相同的大小,通常是2或4个字节。
In particular, given that wchar_t
is bigger than char
, a single null byte does not terminate a wide string.
特别是,考虑到wchar_t大于char,一个空字节不会终止一个宽字符串。
It's also worth noting that byte order is implementation-defined. A wide character with the value 0x1234
, when viewed as a sequence of 8-bit bytes, might appear as any of:
同样值得注意的是,字节顺序是由实现定义的。一个值为0x1234的宽字符,当被视为一个8位字节的序列时,可以显示为:
-
0x12
,0x34
- 0 x12 0 x34
-
0x34
,0x12
- 0 x34 0 x12
-
0x00
,0x00
,0x12
,0x34
- 0 x00 0 x00 0 x12 0 x34
-
0x34
,0x12
,0x00
,0x00
- 0 x34,0 x12 0 x00 0 x00
And those aren't the only possibilities.
这些并不是唯一的可能性。
#3
4
Here you can read a bit more of Wide Characters: http://en.wikipedia.org/wiki/Wide_character#Size_of_a_wide_character
在这里,您可以阅读更多的宽字符:http://en.wikipedia.org/wiki/Wide_character#Size_of_a_wide_character
Terminations are L'\0', means a 16-bit null so it's like two 8-bit null chars.
终止是L'\0',意思是16位零,所以它就像两个8位零字符。
Remember that "009A" is only 1 wchar so is not a null wchar.
请记住,“009A”仅仅是1个wchar,所以不是空wchar。
#4
0
if you declare
如果你声明
WCHAR tempWchar[BUFFER_SIZE];
you make it null
你让它空
for (int i = 0; i < BUFFER_SIZE; i++)
tempWchar[i] = NULL;
#1
6
Since a wide string is an array of wide characters, it couldn't even end in an one-byte NUL. It is a two-byte NUL. (Arrays in C/C++ can only hold members of the same type, so of the same size).
因为宽字符串是一个宽字符数组,它甚至不能以一个字节的NUL结束。它是一个2字节的NUL。(C/ c++中的数组只能容纳相同类型、相同大小的成员)。
Also, for ASCII standard characters, there always is one or three one-byte 0, as only extended characters start by a non-zero first byte (depending on whether wchar_t
is 16 or 32 bit wide - for simplicity, I assume 16-bit and little-endian):
同样,对于ASCII标准字符,总是有一个或三个1字节的0,因为只有扩展字符以非0的第一个字节开头(取决于wchar_t是16位还是32位宽——为了简单起见,我假设是16位和little-endian):
HELLO is 72 00 69 00 76 00 76 00 79 00 00 00
#2
5
In C (quoting the N1570 draft, section 7.1.1):
C(引用N1570草稿,第7.1.1节):
A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character.
宽字符串是由第一个空宽字符终止并包含第一个空宽字符的连续宽字符序列。
where a "wide character" is a value of type wchar_t
, which is defined in <stddef.h>
as an integer type.
其中“宽字符”是wchar_t类型的值,在
I can't find a definition of "wide string" in the N3337 draft of the C++ standard, but it should be similar. One minor difference is that wchar_t
is a typedef in C, and a built-in type (whose name is a keyword) in C++. But since C++ shares most of the C library, including functions that act on wide strings, it's safe to assume that the C and C++ definitions are compatible. (If someone can find something more concrete in the C++ standard, please comment or edit this paragraph.)
在c++标准的N3337草案中,我找不到“宽字符串”的定义,但它应该是类似的。一个小的区别是,wchar_t在C中是一个typedef,在c++中是一个内置类型(其名称是一个关键字)。但是由于c++共享了大部分C库,包括作用于宽字符串的函数,所以可以安全地假设C和c++定义是兼容的。(如果有人能在c++标准中找到更具体的内容,请评论或编辑这一段。)
In both C and C++, the size of a wchar_t
is implementation-defined. It's typically either 2 or 4 bytes (16 or 32 bits, unless you're on a very exotic system with bytes bigger than 8 bits). A wide string is a sequence of wide characters (wchar_t
values), terminated by a null wide character. The terminating wide character will have the same size as any other wide character, typically either 2 or 4 bytes.
在C和c++中,wchar_t的大小都是由实现定义的。它通常是2或4个字节(16或32位,除非你在一个非常奇特的系统中,字节大于8比特)。宽字符串是宽字符(wchar_t值)的序列,以空宽字符结束。终端宽字符将具有与任何其他宽字符相同的大小,通常是2或4个字节。
In particular, given that wchar_t
is bigger than char
, a single null byte does not terminate a wide string.
特别是,考虑到wchar_t大于char,一个空字节不会终止一个宽字符串。
It's also worth noting that byte order is implementation-defined. A wide character with the value 0x1234
, when viewed as a sequence of 8-bit bytes, might appear as any of:
同样值得注意的是,字节顺序是由实现定义的。一个值为0x1234的宽字符,当被视为一个8位字节的序列时,可以显示为:
-
0x12
,0x34
- 0 x12 0 x34
-
0x34
,0x12
- 0 x34 0 x12
-
0x00
,0x00
,0x12
,0x34
- 0 x00 0 x00 0 x12 0 x34
-
0x34
,0x12
,0x00
,0x00
- 0 x34,0 x12 0 x00 0 x00
And those aren't the only possibilities.
这些并不是唯一的可能性。
#3
4
Here you can read a bit more of Wide Characters: http://en.wikipedia.org/wiki/Wide_character#Size_of_a_wide_character
在这里,您可以阅读更多的宽字符:http://en.wikipedia.org/wiki/Wide_character#Size_of_a_wide_character
Terminations are L'\0', means a 16-bit null so it's like two 8-bit null chars.
终止是L'\0',意思是16位零,所以它就像两个8位零字符。
Remember that "009A" is only 1 wchar so is not a null wchar.
请记住,“009A”仅仅是1个wchar,所以不是空wchar。
#4
0
if you declare
如果你声明
WCHAR tempWchar[BUFFER_SIZE];
you make it null
你让它空
for (int i = 0; i < BUFFER_SIZE; i++)
tempWchar[i] = NULL;