????(and其他unicode characters)g++标识符不允许

时间:2021-06-03 22:33:08

I am ???? to find that I cannot use ???? as a valid identifier with g++ 4.7, even with the -fextended-identifiers option enabled:

????发现我不能使用????作为一个有效的标识符使用g++ 4.7,即使enabled:-fextended-identifiers选项

int main(int argc, const char* argv[])
{
  const char* ???? = "I'm very happy";
  return 0;
}

main.cpp:3:3: error: stray ‘\360’ in program
main.cpp:3:3: error: stray ‘\237’ in program
main.cpp:3:3: error: stray ‘\230’ in program
main.cpp:3:3: error: stray ‘\203’ in program

主要。错误:程序主中出现“\360”。cpp:3:3:错误:在程序main中有“\237”。cpp:3:3:错误:在程序main中迷失了“\230”。错误:程序中出现“\203”错误

After some googling, I discovered that UTF-8 characters are not yet supported in identifiers but a universal-character-name should work. So I convert my source to:

在搜索之后,我发现在标识符中还不支持UTF-8字符,但是通用字符名应该可以工作。因此我将我的来源转换为:

int main(int argc, const char* argv[])
{
  const char* \U0001F603 = "I'm very happy";
  return 0;
}

main.cpp:3:15: error: universal character \U0001F603 is not valid in an identifier

主要。错误:通用字符\U0001F603在标识符中无效

So apparently ???? isn't a valid identifier character. However, the standard specifically allows characters from the range 10000-1FFFD in Annex E.1 and doesn't disallow it as an initial character in E.2. My next effort was to see if any other allowed unicode characters worked - but none that I tried did. Not even the ever important PILE OF POO (????) character.

显然????isn't character.一个有效的标识符但是,该标准特别允许在附件E.1中使用10000-1FFFD范围内的字符,并且不允许它作为E.2中的初始字符。我的下一个努力是看看是否有其他的允许使用unicode字符——但我尝试过的都没有。甚至曾经重要的堆粪便(????)character.

So, for the sake of meaningful and descriptive variable names, what gives? Does -fextended-identifiers do as it advertises or not? Is it only supported in the very latest build? And what kind of support do other compilers have?

那么,为了有意义的和描述性的变量名,什么是?- fextension -identifier是否像它声明的那样?它只支持最新版本吗?其他编译器有什么支持?

3 个解决方案

#1


16  

As of 4.8, gcc does not support characters outside of the BMP used as identifiers. It seems to be an unnecessary restriction. Also, gcc only supports a very restricted set of character described in ucnid.tab, based on C99 and C++98 (it is not updated to C11 and C++11 yet, it seems).

从4.8开始,gcc不支持作为标识符的BMP之外的字符。这似乎是一个不必要的限制。另外,gcc只支持在ucnid中描述的非常有限的字符集。tab基于C99和c++ 98(看起来还没有升级到C11和c++ 11)。

As described in the manual, -fextended-identifiers is experimental, so it has a higher chance won't work as expected.

正如手册中所描述的,- fextension标识符是实验性的,因此它有更大的可能性不能正常工作。


Edit:

编辑:

GCC supported the C11 character set starting from 4.9.0 (svn r204886 to be precise). So OP's second piece of code using \U0001F603 does work. I still can't get the actual code using ???? to work even with -finput-charset=UTF-8 with GCC 7 on https://gcc.godbolt.org though (You may want to follow this bug report, provided by @DanielWolf).

GCC支持从4.9.0开始的C11字符集(准确地说,是svn r204886)。所以OP的第二段代码使用\U0001F603是有效的。我仍然使用????can't获得实际的代码工作即使-finput-charset=UTF-8 GCC 7 https://gcc.godbolt.org尽管(You可能想要遵循这个bug report,提供@DanielWolf).

Meanwhile both pieces of code work on clang 3.3 without any options other than -std=c++11.

同时,这两段代码都在clang 3.3上工作,除了-std=c++11之外没有其他选项。

#2


4  

However, the standard specifically allows characters from the range 10000-1FFFD in Annex E.1 and doesn't disallow it as an initial character in E.2.

但是,该标准特别允许在附件E.1中使用10000-1FFFD范围内的字符,并且不允许它作为E.2中的初始字符。

One thing to keep in mind is that just because the C++ standard allows (or disallows) some feature, does not necessarily mean that your compiler supports (or doesn't support) that feature.

需要记住的一点是,仅仅因为c++标准允许(或不允许)某些特性,并不一定意味着编译器支持(或不支持)该特性。

#3


4  

This is a known bug in GCC: Bug 67224 - UTF-8 support for identifier names in GCC.

这是GCC中一个已知的错误:错误67224 - UTF-8支持GCC中的标识符名称。

The bug report is from 2015 and has a rather long discussion. At some point, it mentions that "There doesn't seem to be sufficient demand for this feature so that companies fund it or volunteers step up to implement it."

bug报告是2015年的,讨论时间很长。在某种程度上,它提到“似乎没有足够的需求来支持这个特性,所以公司或者志愿者都没有站出来实施它。”

So if you found this * topic looking for a solution, you might want to add to the discussion over there to show that there is, in fact, demand.

如果你发现这个*主题在寻找一个解决方案,你可能想要添加到那里的讨论中来显示,事实上,有需求。

#1


16  

As of 4.8, gcc does not support characters outside of the BMP used as identifiers. It seems to be an unnecessary restriction. Also, gcc only supports a very restricted set of character described in ucnid.tab, based on C99 and C++98 (it is not updated to C11 and C++11 yet, it seems).

从4.8开始,gcc不支持作为标识符的BMP之外的字符。这似乎是一个不必要的限制。另外,gcc只支持在ucnid中描述的非常有限的字符集。tab基于C99和c++ 98(看起来还没有升级到C11和c++ 11)。

As described in the manual, -fextended-identifiers is experimental, so it has a higher chance won't work as expected.

正如手册中所描述的,- fextension标识符是实验性的,因此它有更大的可能性不能正常工作。


Edit:

编辑:

GCC supported the C11 character set starting from 4.9.0 (svn r204886 to be precise). So OP's second piece of code using \U0001F603 does work. I still can't get the actual code using ???? to work even with -finput-charset=UTF-8 with GCC 7 on https://gcc.godbolt.org though (You may want to follow this bug report, provided by @DanielWolf).

GCC支持从4.9.0开始的C11字符集(准确地说,是svn r204886)。所以OP的第二段代码使用\U0001F603是有效的。我仍然使用????can't获得实际的代码工作即使-finput-charset=UTF-8 GCC 7 https://gcc.godbolt.org尽管(You可能想要遵循这个bug report,提供@DanielWolf).

Meanwhile both pieces of code work on clang 3.3 without any options other than -std=c++11.

同时,这两段代码都在clang 3.3上工作,除了-std=c++11之外没有其他选项。

#2


4  

However, the standard specifically allows characters from the range 10000-1FFFD in Annex E.1 and doesn't disallow it as an initial character in E.2.

但是,该标准特别允许在附件E.1中使用10000-1FFFD范围内的字符,并且不允许它作为E.2中的初始字符。

One thing to keep in mind is that just because the C++ standard allows (or disallows) some feature, does not necessarily mean that your compiler supports (or doesn't support) that feature.

需要记住的一点是,仅仅因为c++标准允许(或不允许)某些特性,并不一定意味着编译器支持(或不支持)该特性。

#3


4  

This is a known bug in GCC: Bug 67224 - UTF-8 support for identifier names in GCC.

这是GCC中一个已知的错误:错误67224 - UTF-8支持GCC中的标识符名称。

The bug report is from 2015 and has a rather long discussion. At some point, it mentions that "There doesn't seem to be sufficient demand for this feature so that companies fund it or volunteers step up to implement it."

bug报告是2015年的,讨论时间很长。在某种程度上,它提到“似乎没有足够的需求来支持这个特性,所以公司或者志愿者都没有站出来实施它。”

So if you found this * topic looking for a solution, you might want to add to the discussion over there to show that there is, in fact, demand.

如果你发现这个*主题在寻找一个解决方案,你可能想要添加到那里的讨论中来显示,事实上,有需求。