I'm having trouble trying to match CJK extension B characters in a NSString
.
我在尝试匹配NSString中的CJK扩展名B字符时遇到了麻烦。
Wikipédia CJK Unified Ideographs Extension B :
*CJK统一表意文字扩展B:
CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese.
CJK Unified Ideographs Extension B是一个Unicode区块,包含了罕见的、历史悠久的中文、日文、韩文和越南文的CJK Ideographs。
The unicode block of the characters is : from U+20000
to U+2A6DF
I'm using the regex : [\\ud840-\\ud868][\\udc00-\\udfff]|\\ud869[\\udc00-\\uded6]
to match CJK extension B characters.
字符的unicode代码块是:从U+20000到U+2A6DF,我使用regex: [\\ud840-\ ud868][\\ \udf]|\\ udc00-\\ \uded6]以匹配CJK扩展B字符。
Here is my code:
这是我的代码:
NSString *searchedString = @"????"; // First character (U+20000)
NSString *pattern = @"[\\ud840-\\ud868][\\udc00-\\udfff]|\\ud869[\\udc00-\\uded6]";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:nil];
if ([regex numberOfMatchesInString:searchedString options:0 range:NSMakeRange(0, [searchedString length])] > 0) {
NSLog(@"matches");
} else {
NSLog(@"doesn't match");
}
Output : doesn't match
输出:不匹配
For exemple, if I try something more simple for a Hiragana character it is working:
例如,如果我为平假名尝试一些更简单的东西,它正在起作用:
NSString *searchedString = @"ひ";
NSString *pattern = @"[\\u3040-\\u309F]";
Output : matches
输出:匹配
Any help would be much appreciated. Thanks.
非常感谢您的帮助。谢谢。
1 个解决方案
#1
2
You may use \Uxxxxxxxx
notation to match those Unicode characters outside the BMP plane.
您可以使用\Uxxxxxxxx符号来匹配BMP平面外的Unicode字符。
Acc. to ICU regex docs:
Acc。ICU regex文档:
\Uhhhhhhhh
Match the character with the hex valuehhhhhhhh
. Exactly eight hex digits must be provided, even though the largest Unicode code point is\U0010ffff
.用hex值hhhhhhhh来匹配字符。必须提供8位十六进制数字,即使最大的Unicode编码点是\U0010ffff。
So, use
因此,使用
NSString *pattern = @"[\\U00020000-\\U0002A6DF]+";
See the online Obj-C demo
参见在线object - c演示
#1
2
You may use \Uxxxxxxxx
notation to match those Unicode characters outside the BMP plane.
您可以使用\Uxxxxxxxx符号来匹配BMP平面外的Unicode字符。
Acc. to ICU regex docs:
Acc。ICU regex文档:
\Uhhhhhhhh
Match the character with the hex valuehhhhhhhh
. Exactly eight hex digits must be provided, even though the largest Unicode code point is\U0010ffff
.用hex值hhhhhhhh来匹配字符。必须提供8位十六进制数字,即使最大的Unicode编码点是\U0010ffff。
So, use
因此,使用
NSString *pattern = @"[\\U00020000-\\U0002A6DF]+";
See the online Obj-C demo
参见在线object - c演示