是“char16_t”和“char32_t”吗?

时间:2021-07-29 21:28:31

NB: I'm sure someone will call this subjective, but I reckon it's fairly tangible.

NB:我肯定有人会说这是主观的,但我认为这是切实可见的。

C++11 gives us new basic_string types std::u16string and std::u32string, type aliases for std::basic_string<char16_t> and std::basic_string<char32_t>, respectively.

C++11为我们提供了新的basic_string类型std::u32string、std的类型别名:basic_string 和std::basic_string

The use of the substrings "u16" and "u32" to me in this context rather implies "UTF-16" and "UTF-32", which would be silly since C++ of course has no concept of text encodings.

在这种情况下,使用子字符串“u16”和“u32”对我来说意味着“UTF-16”和“UTF-32”,这将是愚蠢的,因为c++当然没有文本编码的概念。

The names in fact reflect the character types char16_t and char32_t, but these seem misnamed. They are unsigned, due to the unsignedness of their underlying types:

这些名称实际上反映了字符类型char16_t和char32_t,但这些名称似乎是错误的。他们没有签名,因为他们的潜在类型的不一致:

[C++11: 3.9.1/5]: [..] Types char16_t and char32_t denote distinct types with the same size, signedness, and alignment as uint_least16_t and uint_least32_t, respectively [..]

[c++ 11:3.9.1/5]:[. .类型char16_t和char32_t分别表示具有相同大小、标记和对齐方式的不同类型,分别为uint_最少16_t和uint_最少32_t [.]

But then it seems to me that these names violate the convention that such unsigned types have names beginning 'u', and that the use of numbers like 16 unqualified by terms like least indicate fixed-width types.

但是,在我看来,这些名字违反了惯例,这些无符号类型的名称开始是“u”,而使用像16这样的数字不符合条件,至少表示固定宽度类型。

My question, then, is this: am I imagining things, or are these names fundamentally flawed?

那么,我的问题是:我是在想象事物,还是这些名字有根本的缺陷?

3 个解决方案

#1


11  

The naming convention to which you refer (uint32_t, int_fast32_t, etc.) is actually only used for typedefs, and not for primitive types. The primitive integer types are {signed, unsigned} {char, short, int, long, long long}, {as opposed to float or decimal types} ...

您所引用的命名约定(uint32_t、int_fast32_t等)实际上只用于typedef,而不是用于基本类型。原始整数类型为{签名,无符号}{char, short, int, long, long long},{与float或decimal类型}相反……

However, in addition to those integer types, there are four distinct, unique, fundamental types, char, wchar_t, char16_t and char32_t, which are the types of the respective literals '', L'', u'' and U'' and are used for alpha-numeric type data, and similarly for arrays of those. Those types are of course also integer types, and thus they will have the same layout at some of the arithmetic integer types, but the language makes a very clear distinction between the former, arithmetic types (which you would use for computations) and the latter "character" types which form the basic unit of some type of I/O data.

但是,除了这些整数类型之外,还有四种不同的、独特的、基本的类型,char、wchar_t、char16_t和char32_t,它们分别是字母“L”、“u”和“u”的类型,它们用于字母数字类型的数据,类似于这些类型的数组。这些类型也当然是整数类型,因此他们会有相同的布局的一些运算整数类型,但语言使得一个很清晰的区别是前者,算术类型(你可以用它来计算),后者“字符”类型构成的基本单位某种类型的I / O数据。

(I've previously rambled about those new types here and here.)

(我以前在这里和这里到处乱谈这些新类型。)

So, I think that char16_t and char32_t are actually very aptly named to reflect the fact that they belong to the "char" family of integer types.

因此,我认为char16_t和char32_t实际上是非常恰当地命名的,以反映它们属于整数类型的“char”系列的事实。

#2


4  

are these names fundamentally flawed?

这些名字有根本的缺陷吗?

(I think most of this question has been answered in the comments, but to make an answer) No, not at all. char16_t and char32_t were created for a specific purpose. To have data type support for all Unicode encoding formats (UTF-8 is covered by char) while keeping them as generic as possible to not limit them to only Unicode. Whether they are unsigned or have a fixed-width is not directly related to what they are: character data types. Types which hold and represent characters. Signedness is a property of data types that represent numbers not characters. The types are meant to store characters, either a 16 bit or 32 bit based character data, nothing more or less.

(我想大多数问题都是在评论中回答的,但要回答)不,一点也不。为特定目的创建char16_t和char32_t。为所有Unicode编码格式提供数据类型支持(UTF-8被char覆盖),同时保持它们尽可能通用,以不限制它们只使用Unicode。它们是未签名的还是有固定宽度的,与它们是什么并不直接相关:字符数据类型。持有和表示字符的类型。符号是表示数字而不是字符的数据类型的属性。类型是用来存储字符的,要么是16位,要么是32位基于字符的数据,没有多少。

#3


-3  

They are not fundamentally flawed, by definition - they are part of the standard. If that offends your sensibilities then you must find a way to deal with it. The time to make this argument was before the latest standard was ratified, and that time has long passed.

从根本上说,它们并不是有缺陷的——它们是标准的一部分。如果这冒犯了你的感情,那么你必须找到处理它的方法。在最新的标准被批准之前,这个争论的时间已经过去了。

#1


11  

The naming convention to which you refer (uint32_t, int_fast32_t, etc.) is actually only used for typedefs, and not for primitive types. The primitive integer types are {signed, unsigned} {char, short, int, long, long long}, {as opposed to float or decimal types} ...

您所引用的命名约定(uint32_t、int_fast32_t等)实际上只用于typedef,而不是用于基本类型。原始整数类型为{签名,无符号}{char, short, int, long, long long},{与float或decimal类型}相反……

However, in addition to those integer types, there are four distinct, unique, fundamental types, char, wchar_t, char16_t and char32_t, which are the types of the respective literals '', L'', u'' and U'' and are used for alpha-numeric type data, and similarly for arrays of those. Those types are of course also integer types, and thus they will have the same layout at some of the arithmetic integer types, but the language makes a very clear distinction between the former, arithmetic types (which you would use for computations) and the latter "character" types which form the basic unit of some type of I/O data.

但是,除了这些整数类型之外,还有四种不同的、独特的、基本的类型,char、wchar_t、char16_t和char32_t,它们分别是字母“L”、“u”和“u”的类型,它们用于字母数字类型的数据,类似于这些类型的数组。这些类型也当然是整数类型,因此他们会有相同的布局的一些运算整数类型,但语言使得一个很清晰的区别是前者,算术类型(你可以用它来计算),后者“字符”类型构成的基本单位某种类型的I / O数据。

(I've previously rambled about those new types here and here.)

(我以前在这里和这里到处乱谈这些新类型。)

So, I think that char16_t and char32_t are actually very aptly named to reflect the fact that they belong to the "char" family of integer types.

因此,我认为char16_t和char32_t实际上是非常恰当地命名的,以反映它们属于整数类型的“char”系列的事实。

#2


4  

are these names fundamentally flawed?

这些名字有根本的缺陷吗?

(I think most of this question has been answered in the comments, but to make an answer) No, not at all. char16_t and char32_t were created for a specific purpose. To have data type support for all Unicode encoding formats (UTF-8 is covered by char) while keeping them as generic as possible to not limit them to only Unicode. Whether they are unsigned or have a fixed-width is not directly related to what they are: character data types. Types which hold and represent characters. Signedness is a property of data types that represent numbers not characters. The types are meant to store characters, either a 16 bit or 32 bit based character data, nothing more or less.

(我想大多数问题都是在评论中回答的,但要回答)不,一点也不。为特定目的创建char16_t和char32_t。为所有Unicode编码格式提供数据类型支持(UTF-8被char覆盖),同时保持它们尽可能通用,以不限制它们只使用Unicode。它们是未签名的还是有固定宽度的,与它们是什么并不直接相关:字符数据类型。持有和表示字符的类型。符号是表示数字而不是字符的数据类型的属性。类型是用来存储字符的,要么是16位,要么是32位基于字符的数据,没有多少。

#3


-3  

They are not fundamentally flawed, by definition - they are part of the standard. If that offends your sensibilities then you must find a way to deal with it. The time to make this argument was before the latest standard was ratified, and that time has long passed.

从根本上说,它们并不是有缺陷的——它们是标准的一部分。如果这冒犯了你的感情,那么你必须找到处理它的方法。在最新的标准被批准之前,这个争论的时间已经过去了。