什么是char16_t和char32_t,我在哪里可以找到它们?

时间:2022-12-17 15:07:29

I was looking for char16_t and char32_t, since I’m working with Unicode, and all I could find on the Web was they were inside uchar.h. I found said header inside the iOS SDK (not the macOS one, for some reason), but there were no such types in it. I saw them in a different header, though, but I could not find where they're defined. Also, the info on the internet is scarce at best, so I’m kinda lost here; but I did read wchar_t should not be used for Unicode, which is exactly what I’ve been doing so far, so please help:(

我正在寻找char16_t和char32_t,因为我正在使用Unicode,而我在网上找到的只是他们在uchar.h中。我在iOS SDK中找到了所谓的标题(由于某种原因,不是macOS的标题),但是里面没有这样的类型。不过我在不同的标题中看到了它们,但我找不到它们的定义。此外,互联网上的信息充其量是稀缺的,所以我有点迷失在这里;但我确实读过wchar_t不应该用于Unicode,这正是我到目前为止所做的,所以请帮助:(

2 个解决方案

#1


1  

char16_t and char32_t are specified in the C standard. (Citations below are from the 2018 standard.)

char16_t和char32_t在C标准中指定。 (下面的引文来自2018年的标准。)

Per clause 7.28, the header <uchar.h> declares them as unsigned integer types to be used for 16-bit and 32-bit characters, respectively. You should not have to hunt for them in any other header; #include <uchar.h> should suffice.

根据第7.28节,头文件 将它们声明为无符号整数类型,分别用于16位和32位字符。你不应该在任何其他标题中寻找它们; #include 应该足够了。

Also per clause 7.28, each of these types is a narrowest unsigned integer type with required number of bits. (For example, on an implementation that supported only unsigned integers of 8, 18, 24, and 36, and 50 bits, uchar16_t would have to be the 18-bit size; it could not be 24, and uchar32_t would have to be 36.)

同样根据第7.28节,这些类型中的每一种都是具有所需位数的最窄无符号整数类型。 (例如,在仅支持8,18,24和36以及50位的无符号整数的实现上,uchar16_t必须是18位大小;它不能是24,而uchar32_t必须是36 。)

Per clause 6.4.5, when a string literal is prefixed by u or U, as in u"abc" or U"abc", it is a wide string literal in which the elements have type char16_t or char32_t, respectively.

在第6.4.5节中,当字符串文字以u或U为前缀时,如u“abc”或U“abc”,它是一个宽字符串文字,其中元素分别具有char16_t或char32_t类型。

Per clause 6.10.8.2, if the C implementation defines the preprocessor macro __STDC_UTF_16__ to be 1, it indicates that char16_t values are UTF-16 encoded. Similarly, __STDC_UTF_32__ indicates char32_t values are UTF-32 encoded. In the absence of these macros, no assertion is made about the encodings.

根据第6.10.8.2节,如果C实现将预处理器宏__STDC_UTF_16__定义为1,则表示char16_t值是UTF-16编码的。类似地,__ STDC_UTF_32__表示char32_t值是UTF-32编码的。在没有这些宏的情况下,没有关于编码的断言。

#2


0  

Microsoft has a fair description: https://docs.microsoft.com/en-us/cpp/cpp/char-wchar-t-char16-t-char32-t?view=vs-2017

微软有一个公平的描述:https://docs.microsoft.com/en-us/cpp/cpp/char-wchar-t-char16-t-char32-t?view = vs-2017

  • char is the original, typically 8-bit, character representation.

    char是原始的,通常是8位字符表示。

  • wchar is a "wide char", 16-bits, used by Windows. Microsoft was an early adopter of Unicode, unfortunately this stuck them with this only-used-on-Windows encoding.

    wchar是一个“宽字符”,16位,由Windows使用。微软是Unicode的早期采用者,不幸的是,这使得他们只使用这种仅用于Windows的编码。

  • char16 and char32, used for UTF-16 and -32

    char16和char32,用于UTF-16和-32

Most non-Windows systems use UTF-8 for encoding (and even Windows 10 is adopting this, https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8). UTF-8 is by far the most common encoding used today on the web. (ref: https://en.wikipedia.org/wiki/UTF-8)

大多数非Windows系统使用UTF-8进行编码(甚至Windows 10也采用了这种方法,https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8)。 UTF-8是目前网络上最常用的编码。 (参考:https://en.wikipedia.org/wiki/UTF-8)

UTF-8 is stored in a series of chars. UTF-8 is likely the encoding you will find simplest to adopt, depending on your OS.

UTF-8存储在一系列字符中。 UTF-8很可能是您最容易采用的编码,具体取决于您的操作系统。

#1


1  

char16_t and char32_t are specified in the C standard. (Citations below are from the 2018 standard.)

char16_t和char32_t在C标准中指定。 (下面的引文来自2018年的标准。)

Per clause 7.28, the header <uchar.h> declares them as unsigned integer types to be used for 16-bit and 32-bit characters, respectively. You should not have to hunt for them in any other header; #include <uchar.h> should suffice.

根据第7.28节,头文件 将它们声明为无符号整数类型,分别用于16位和32位字符。你不应该在任何其他标题中寻找它们; #include 应该足够了。

Also per clause 7.28, each of these types is a narrowest unsigned integer type with required number of bits. (For example, on an implementation that supported only unsigned integers of 8, 18, 24, and 36, and 50 bits, uchar16_t would have to be the 18-bit size; it could not be 24, and uchar32_t would have to be 36.)

同样根据第7.28节,这些类型中的每一种都是具有所需位数的最窄无符号整数类型。 (例如,在仅支持8,18,24和36以及50位的无符号整数的实现上,uchar16_t必须是18位大小;它不能是24,而uchar32_t必须是36 。)

Per clause 6.4.5, when a string literal is prefixed by u or U, as in u"abc" or U"abc", it is a wide string literal in which the elements have type char16_t or char32_t, respectively.

在第6.4.5节中,当字符串文字以u或U为前缀时,如u“abc”或U“abc”,它是一个宽字符串文字,其中元素分别具有char16_t或char32_t类型。

Per clause 6.10.8.2, if the C implementation defines the preprocessor macro __STDC_UTF_16__ to be 1, it indicates that char16_t values are UTF-16 encoded. Similarly, __STDC_UTF_32__ indicates char32_t values are UTF-32 encoded. In the absence of these macros, no assertion is made about the encodings.

根据第6.10.8.2节,如果C实现将预处理器宏__STDC_UTF_16__定义为1,则表示char16_t值是UTF-16编码的。类似地,__ STDC_UTF_32__表示char32_t值是UTF-32编码的。在没有这些宏的情况下,没有关于编码的断言。

#2


0  

Microsoft has a fair description: https://docs.microsoft.com/en-us/cpp/cpp/char-wchar-t-char16-t-char32-t?view=vs-2017

微软有一个公平的描述:https://docs.microsoft.com/en-us/cpp/cpp/char-wchar-t-char16-t-char32-t?view = vs-2017

  • char is the original, typically 8-bit, character representation.

    char是原始的,通常是8位字符表示。

  • wchar is a "wide char", 16-bits, used by Windows. Microsoft was an early adopter of Unicode, unfortunately this stuck them with this only-used-on-Windows encoding.

    wchar是一个“宽字符”,16位,由Windows使用。微软是Unicode的早期采用者,不幸的是,这使得他们只使用这种仅用于Windows的编码。

  • char16 and char32, used for UTF-16 and -32

    char16和char32,用于UTF-16和-32

Most non-Windows systems use UTF-8 for encoding (and even Windows 10 is adopting this, https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8). UTF-8 is by far the most common encoding used today on the web. (ref: https://en.wikipedia.org/wiki/UTF-8)

大多数非Windows系统使用UTF-8进行编码(甚至Windows 10也采用了这种方法,https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8)。 UTF-8是目前网络上最常用的编码。 (参考:https://en.wikipedia.org/wiki/UTF-8)

UTF-8 is stored in a series of chars. UTF-8 is likely the encoding you will find simplest to adopt, depending on your OS.

UTF-8存储在一系列字符中。 UTF-8很可能是您最容易采用的编码,具体取决于您的操作系统。