If I use the following code, the regex group does not show the expected unicode string. Can somebody explain to me whether I did a mistake, or is it even possible that it is an intrinsic problem in perl itself.
如果我使用以下代码,则regex组不会显示预期的unicode字符串。有人可以向我解释我是否犯了错误,或者甚至可能它是perl本身的内在问题。
echo 'éá'|perl -ne 'if ( /(\P{L}+)/ ) { print $1; }'
�
Even if I take this explanation into account and add the UTF-8 encoding layers to perl, it still does not give me the string 'éá' for the regex group:
即使我考虑到这个解释并将UTF-8编码层添加到perl,它仍然不会为正则表达式组提供字符串'éá':
echo 'éá'|perl -CS -ne 'if ( /(\P{L}+)/ ) { print $1,$_; }'
éá
The output for the group seems to be empty and includes a newline sign.
该组的输出似乎是空的,并包含换行符号。
Any help is much appreciated.
任何帮助深表感谢。
1 个解决方案
#1
2
In your input, éá
are 2 Unicode letters. \P{L}
is a construct matching any character other than a Unicode letter.
在您的输入中,éá是2个Unicode字母。 \ P {L}是一个与Unicode字母以外的任何字符匹配的构造。
So, using the opposite construct - \p{L}
- you will fix your issue.
因此,使用相反的结构 - \ p {L} - 您将解决您的问题。
Use
/(\p{L}+)/
#1
2
In your input, éá
are 2 Unicode letters. \P{L}
is a construct matching any character other than a Unicode letter.
在您的输入中,éá是2个Unicode字母。 \ P {L}是一个与Unicode字母以外的任何字符匹配的构造。
So, using the opposite construct - \p{L}
- you will fix your issue.
因此,使用相反的结构 - \ p {L} - 您将解决您的问题。
Use
/(\p{L}+)/