It appears, based on a urwid example that u'\N{HYPHEN BULLET}
will create a unicode character that is a hyphen intended for a bullet.
看来,根据一个urwid示例,你'N {HYPHEN BULLET}将创建一个unicode字符,这是一个用于子弹的连字符。
The names for unicode characters seem to be defined at fileformat.info and some element of using Unicode in Python appears in the howto documentation. Though there is no mention of the \N{}
syntax.
unicode字符的名称似乎是在fileformat.info中定义的,而在Python中使用Unicode的某些元素出现在howto文档中。虽然没有提到\ N {}语法。
If you pull all these docs together you get the idea that the constant u"\N{HYPHEN BULLET}"
creates a ⁃
如果你将所有这些文档拉到一起,你会发现常量u“\ N {HYPHEN BULLET}”创建了一个⁃
However, this is all a theory based on pulling all this data together. I can find no documentation for "\N{}
in the Python docs.
然而,这完全是基于将所有这些数据集中在一起的理论。我在Python文档中找不到“\ N {}的文档。
My question is whether my theory of operation is correct and whether it is documented anywhere?
我的问题是我的操作理论是否正确以及是否记录在任何地方?
3 个解决方案
#1
3
Not every gory detail can be found in a how-to. The table of escape sequences in the reference manual includes:
并非每个血淋淋的细节都可以在方法中找到。参考手册中的转义序列表包括:
Escape Sequence: \N{name}
Meaning: Character named name
in the Unicode database (Unicode only)
转义序列:\ N {name}含义:Unicode数据库中名为name的字符(仅限Unicode)
#2
2
You are correct that u"\N{CHARACTER NAME}
produces a valid unicode character in Python.
你是正确的,“\ N {CHARACTER NAME}在Python中生成一个有效的unicode字符。
It is not documented much in the Python docs, but after some searching I found a reference to it on effbot.org
它没有在Python文档中记录太多,但经过一些搜索后,我在effbot.org上找到了它的引用
http://effbot.org/librarybook/ucnhash.htm
http://effbot.org/librarybook/ucnhash.htm
The ucnhash module
(Implementation, 2.0 only) This module is an implementation module, which provides a name to character code mapping for Unicode string literals. If this module is present, you can use \N{} escapes to map Unicode character names to codes.
(仅限实现,2.0)此模块是一个实现模块,它为Unicode字符串文字的字符代码映射提供名称。如果存在此模块,则可以使用\ N {}转义将Unicode字符名称映射到代码。
In Python 2.1, the functionality of this module was moved to the unicodedata module.
在Python 2.1中,该模块的功能已移至unicodedata模块。
Checking the documentation for unicodedata
shows that the module is using the data from the Unicode Character Database.
检查unicodedata的文档表明该模块正在使用Unicode字符数据库中的数据。
unicodedata — Unicode Database
This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. The data contained in this database is compiled from the UCD version 9.0.0.
此模块提供对Unicode字符数据库(UCD)的访问,该数据库定义所有Unicode字符的字符属性。此数据库中包含的数据是从UCD版本9.0.0编译的。
The full data can be found at: https://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt
完整数据可在以下网址找到:https://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt
The data has the structure: HEXVALUE;CHARACTER NAME;etc..
so you could use this data to look up characters.
数据具有以下结构:HEXVALUE; CHARACTER NAME;等等。因此您可以使用此数据查找字符。
For example:
例如:
# 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;
>>> u"\N{LATIN CAPITAL LETTER A}"
'A'
# FF7B;HALFWIDTH KATAKANA LETTER SA;Lo;0;L;<narrow> 30B5;;;;N;;;;;
>>> u"\N{HALFWIDTH KATAKANA LETTER SA}"
'サ'
#3
2
The \N{}
syntax is documented in the Unicode HOWTO, at least.
至少,Unicode HOWTO中记录了\ N {}语法。
The names are documented in the Unicode standard, such as:
这些名称记录在Unicode标准中,例如:
http://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt
The unicodedata
module can look up a name for a character:
unicodedata模块可以查找角色的名称:
>>> import unicodedata as ud
>>> ud.name('A')
'LATIN CAPITAL LETTER A'
>>> print('\N{LATIN CAPITAL LETTER A}')
A
#1
3
Not every gory detail can be found in a how-to. The table of escape sequences in the reference manual includes:
并非每个血淋淋的细节都可以在方法中找到。参考手册中的转义序列表包括:
Escape Sequence: \N{name}
Meaning: Character named name
in the Unicode database (Unicode only)
转义序列:\ N {name}含义:Unicode数据库中名为name的字符(仅限Unicode)
#2
2
You are correct that u"\N{CHARACTER NAME}
produces a valid unicode character in Python.
你是正确的,“\ N {CHARACTER NAME}在Python中生成一个有效的unicode字符。
It is not documented much in the Python docs, but after some searching I found a reference to it on effbot.org
它没有在Python文档中记录太多,但经过一些搜索后,我在effbot.org上找到了它的引用
http://effbot.org/librarybook/ucnhash.htm
http://effbot.org/librarybook/ucnhash.htm
The ucnhash module
(Implementation, 2.0 only) This module is an implementation module, which provides a name to character code mapping for Unicode string literals. If this module is present, you can use \N{} escapes to map Unicode character names to codes.
(仅限实现,2.0)此模块是一个实现模块,它为Unicode字符串文字的字符代码映射提供名称。如果存在此模块,则可以使用\ N {}转义将Unicode字符名称映射到代码。
In Python 2.1, the functionality of this module was moved to the unicodedata module.
在Python 2.1中,该模块的功能已移至unicodedata模块。
Checking the documentation for unicodedata
shows that the module is using the data from the Unicode Character Database.
检查unicodedata的文档表明该模块正在使用Unicode字符数据库中的数据。
unicodedata — Unicode Database
This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. The data contained in this database is compiled from the UCD version 9.0.0.
此模块提供对Unicode字符数据库(UCD)的访问,该数据库定义所有Unicode字符的字符属性。此数据库中包含的数据是从UCD版本9.0.0编译的。
The full data can be found at: https://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt
完整数据可在以下网址找到:https://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt
The data has the structure: HEXVALUE;CHARACTER NAME;etc..
so you could use this data to look up characters.
数据具有以下结构:HEXVALUE; CHARACTER NAME;等等。因此您可以使用此数据查找字符。
For example:
例如:
# 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;
>>> u"\N{LATIN CAPITAL LETTER A}"
'A'
# FF7B;HALFWIDTH KATAKANA LETTER SA;Lo;0;L;<narrow> 30B5;;;;N;;;;;
>>> u"\N{HALFWIDTH KATAKANA LETTER SA}"
'サ'
#3
2
The \N{}
syntax is documented in the Unicode HOWTO, at least.
至少,Unicode HOWTO中记录了\ N {}语法。
The names are documented in the Unicode standard, such as:
这些名称记录在Unicode标准中,例如:
http://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt
The unicodedata
module can look up a name for a character:
unicodedata模块可以查找角色的名称:
>>> import unicodedata as ud
>>> ud.name('A')
'LATIN CAPITAL LETTER A'
>>> print('\N{LATIN CAPITAL LETTER A}')
A