以编程方式告诉一个Unicode字符是否占用终端中的多个字符空间

时间:2021-07-05 20:12:49

I discovered that in the Mac OS X Terminal, some Unicode characters take up more than one character space. For example 27FC (long rightwards arrow from bar). It prints two characters wide, but the second character prints on top of whatever the next character is, so you have to do ⟼<space> for it to print correctly. For example, ⟼a prints like. 以编程方式告诉一个Unicode字符是否占用终端中的多个字符空间 (I made the font size large so that you could see it, but it does it for all font sizes).

我发现在Mac OS X终端中,一些Unicode字符占用了不止一个字符空间。例如,27FC(来自bar的长向右箭头)。它打印两个字符宽,但是第二个字符打印任何下一个字符,所以你必须做⟼ <空位> 打印正确。例如,⟼打印。(我把字体尺寸做得很大,这样你就能看到它,但它适用于所有字体大小)。

By the way, this is the Menlo font in the Mac OS X 10.6 Terminal application.

顺便说一下,这是Mac OS X 10.6终端应用程序中的Menlo字体。

23B3 (SUMMATION TOP) actually prints as two characters wide and tall (at least in Safari, it does this in the browser too, notice how it overlaps with the above line)⎲

23 b3(求和)实际上打印两个字符的宽度和高度(至少在Safari,它在浏览器中,注意它与上面的线)⎲

However, in the terminal in Ubuntu, none of these characters print wider or taller than one character.

然而,在Ubuntu的终端中,没有一个字符打印得比一个字符宽或高。

Is there a way to programmatically tell if a character takes up more than one space?

是否有一种方法可以通过编程来判断一个字符是否占用了多个空格?

I'm using Python, so something that works either in pure Python or on POSIX (i.e., I can call some bash command using the os module) would be preferred.

我使用的是Python,所以可以在纯Python或POSIX上运行。,我可以使用os模块调用bash命令)。

Also, I should note that if I increase the "Character Spacing" setting in the font settings of the terminal to 1.5 (from the default 1.0), then it looks like 以编程方式告诉一个Unicode字符是否占用终端中的多个字符空间.

另外,我还应该注意,如果我将终端字体设置中的“字符间距”设置从默认的1.0增加到1.5,那么它看起来就像这样。

Also, it'd be nice if an answer could give some insight into all of this (i.e., why does it happen?)

而且,如果一个答案能让你对这一切有一些了解就好了。为什么会这样?

3 个解决方案

#1


6  

While it's not relevant for the specific examples you give (all of which display at the size of a single character for me on Ubuntu), CJK characters have a unicode property which indicates that they are wider than normal, and display at double width in some terminals.

虽然它与您给出的具体示例无关(所有这些都显示在Ubuntu上的单个字符的大小),但CJK字符有一个unicode属性,表示它们比正常范围宽,在某些终端显示为双宽。

For example, in python:

例如,在python中:

# 'a' is a normal (narrow) character
# '愛' can be interpreted as a double-width (wide) character
import unicodedata
assert unicodedata.east_asian_width('a') == 'N'
assert unicodedata.east_asian_width('愛') == 'W'

Apart from this, I don't think there's a specification for how much space certain characters should take up, other than the size of the glyph in whatever font you are using (which your terminal is probably ignoring for the reason Ignacio gave).

除此之外,我认为除了字体中字形的大小(由于伊格纳西奥给出的原因,您的终端可能忽略了这个字体)之外,对于某些字符应该占用多少空间,还没有一个规范。

For more info on the "east asian width" property, see http://www.unicode.org/reports/tr11/

有关“东亚宽度”属性的更多信息,请参见http://www.unicode.org/reports/tr11/

#2


4  

No, since there's no way to tell what font the terminal is using. Always use a monospace font, lesson learned.

没有,因为没有办法告诉终端使用什么字体。永远使用单色字体,吸取教训。

It happens because the terminal is using a "cell" font layout engine (i.e. characters are printed at specific X and Y coordinates regardless of their actual size) whereas the browser is using a "flow" font layout engine (subsequent characters print where the previous character ended).

这是因为终端使用的是“cell”字体布局引擎(即,不管字符的实际大小如何,字符都是在特定的X和Y坐标下打印的),而浏览器使用的是“flow”字体布局引擎(后续字符在前面字符结束的地方打印)。

#3


1  

This is a bug in the OS X terminal.

这是OS X终端的一个错误。

I wouldn't recommend trying to work around it, because it will break on other systems (e.g. Linux), and it might get fixed eventually on the Mac. It will also confuse anyone that pastes into another applicaton.

我不建议尝试解决它,因为它会在其他系统(比如Linux)上崩溃,最终可能在Mac上得到修复。

#1


6  

While it's not relevant for the specific examples you give (all of which display at the size of a single character for me on Ubuntu), CJK characters have a unicode property which indicates that they are wider than normal, and display at double width in some terminals.

虽然它与您给出的具体示例无关(所有这些都显示在Ubuntu上的单个字符的大小),但CJK字符有一个unicode属性,表示它们比正常范围宽,在某些终端显示为双宽。

For example, in python:

例如,在python中:

# 'a' is a normal (narrow) character
# '愛' can be interpreted as a double-width (wide) character
import unicodedata
assert unicodedata.east_asian_width('a') == 'N'
assert unicodedata.east_asian_width('愛') == 'W'

Apart from this, I don't think there's a specification for how much space certain characters should take up, other than the size of the glyph in whatever font you are using (which your terminal is probably ignoring for the reason Ignacio gave).

除此之外,我认为除了字体中字形的大小(由于伊格纳西奥给出的原因,您的终端可能忽略了这个字体)之外,对于某些字符应该占用多少空间,还没有一个规范。

For more info on the "east asian width" property, see http://www.unicode.org/reports/tr11/

有关“东亚宽度”属性的更多信息,请参见http://www.unicode.org/reports/tr11/

#2


4  

No, since there's no way to tell what font the terminal is using. Always use a monospace font, lesson learned.

没有,因为没有办法告诉终端使用什么字体。永远使用单色字体,吸取教训。

It happens because the terminal is using a "cell" font layout engine (i.e. characters are printed at specific X and Y coordinates regardless of their actual size) whereas the browser is using a "flow" font layout engine (subsequent characters print where the previous character ended).

这是因为终端使用的是“cell”字体布局引擎(即,不管字符的实际大小如何,字符都是在特定的X和Y坐标下打印的),而浏览器使用的是“flow”字体布局引擎(后续字符在前面字符结束的地方打印)。

#3


1  

This is a bug in the OS X terminal.

这是OS X终端的一个错误。

I wouldn't recommend trying to work around it, because it will break on other systems (e.g. Linux), and it might get fixed eventually on the Mac. It will also confuse anyone that pastes into another applicaton.

我不建议尝试解决它,因为它会在其他系统(比如Linux)上崩溃,最终可能在Mac上得到修复。