如何使用正则表达式搜索使用NSPredicate忽略某些字符?

时间:2022-05-25 16:49:34

In Hebrew, there are certain vowels that NSPredicate fails to ignore even when using the 'd' (diacritic insensitive) modifier in the predicate. I was told that the solution is to use regular expressions to do the search.

在希伯来语中,即使在谓词中使用“d”(变音符号不敏感)修饰符,NSPredicate仍然会忽略某些元音。我被告知解决方案是使用正则表达式进行搜索。

How do I take a search string and "use regex" to search hebrew text that contains vowels, ignoring those vowels?

如何使用搜索字符串和“使用正则表达式”搜索包含元音的希伯来文本,忽略这些元音?

Edit:

编辑:

In other words, If I wanted to search the following text, disregarding dashes and asterisks, how would I do so using regex?

换句话说,如果我想搜索以下文本,忽略破折号和星号,我将如何使用正则表达式?

Example Text:

示例文本:

I w-en*t t-o the st*o*r*-e yes-ster*day.

我不知道st * o * r * -e yes-ster * day。

Edit 2:

编辑2:

Essentially, I want to:

基本上,我想:

  1. Take an input string from a user
  2. 从用户获取输入字符串
  3. Take a string to search
  4. 拿一根字符串进行搜索
  5. Use a regex based on the user's search string to search for "contains" matches in the larger block of text. The regex should ignore vowels as shown above.
  6. 使用基于用户搜索字符串的正则表达式在较大的文本块中搜索“包含”匹配项。正则表达式应该忽略如上所示的元音。

Edit 3:

编辑3:

Here's how I'm implementing my search:

以下是我实现搜索的方式:

//
//  The user updated the search text
//

- (BOOL)searchDisplayController:(UISearchDisplayController *)controller 
shouldReloadTableForSearchString:(NSString *)searchString{

    NSMutableArray *unfilteredResults = [[[[self.fetchedResultsController sections] objectAtIndex:0] objects] mutableCopy];

    if (self.filteredArray == nil) {
        self.filteredArray = [[[NSMutableArray alloc ] init] autorelease];
    }

    [filteredArray removeAllObjects];

    NSPredicate *predicate;

    if (controller.searchBar.selectedScopeButtonIndex == 0) {
        predicate = [NSPredicate predicateWithFormat:@"articleTitle CONTAINS[cd] %@", searchString];
    }else if (controller.searchBar.selectedScopeButtonIndex == 1) {
        predicate = [NSPredicate predicateWithFormat:@"articleContent CONTAINS[cd] %@", searchString];            
    }else if (controller.searchBar.selectedScopeButtonIndex == 2){
        predicate = [NSPredicate predicateWithFormat:@"ANY tags.tagText CONTAINS[cd] %@", searchString];
    }else{
        predicate = [NSPredicate predicateWithFormat:@"(ANY tags.tagText CONTAINS[cd] %@) OR (dvarTorahTitle CONTAINS[cd] %@) OR (dvarTorahContent CONTAINS[cd] %@)", searchString,searchString,searchString];
    }

    for (Article *article in unfilteredResults) {

        if ([predicate evaluateWithObject:article]) {
            [self.filteredArray addObject:article];
        }

    }

    [unfilteredResults release];


    return YES;
}

Edit 4:

编辑4:

I am not required to use regex for this, was just advised to do so. If you have another way that works, go for it!

我不需要使用正则表达式,只是建议这样做。如果你有另一种方法可行,那就去吧!

Edit 5:

编辑5:

I've modified my search to look like this:

我修改了我的搜索,看起来像这样:

NSInteger length = [searchString length];

NSString *vowelsAsRegex = @"[\\u5B0-\\u55C4]*";

NSMutableString *modifiedSearchString = [searchString mutableCopy];

for (int i = length; i > 0; i--) {
    [modifiedSearchString insertString:vowelsAsRegex atIndex:i];
}

if (controller.searchBar.selectedScopeButtonIndex == 0) {
            predicate = [NSPredicate predicateWithFormat:@"articleTitle CONTAINS[cd] %@", modifiedSearchString];
        }else if (controller.searchBar.selectedScopeButtonIndex == 1) {
            predicate = [NSPredicate predicateWithFormat:@"articleContent CONTAINS[cd] %@", modifiedSearchString];            
        }else if (controller.searchBar.selectedScopeButtonIndex == 2){
            predicate = [NSPredicate predicateWithFormat:@"ANY tags.tagText CONTAINS[cd] %@", modifiedSearchString];
        }else{
            predicate = [NSPredicate predicateWithFormat:@"(ANY tags.tagText CONTAINS[cd] %@) OR (dvarTorahTitle CONTAINS[cd] %@) OR (dvarTorahContent CONTAINS[cd] %@)", modifiedSearchString,modifiedSearchString,modifiedSearchString];
        }

for (Article *article in unfilteredResults) {
  if ([predicate evaluateWithObject:article]) {
    [self.filteredArray addObject:article];
  }          
 }

I'm still missing something here, what do I need to do to make this work?

我在这里仍然遗漏了一些东西,我需要做些什么来完成这项工作?

Edit 6:

编辑6:

Okay, almost there. I need to make two more changes to be finished with this.

好的,差不多了。我需要再做两次更改才能完成。

I need to be able to add other ranges of characters to the regex, which might appear instead of, or in addition to the character in the other set. I've trie changing the first range to this:

我需要能够为正则表达式添加其他范围的字符,这可能会出现,而不是另一组中的字符。我试图将第一个范围更改为:

[\u05b0-\u05c, \u0591-\u05AF]?

Something tells me that this is incorrect.

有些东西告诉我这是不正确的。

Also, I need the rest of the regex to be case insensitive. What modifier do I need to use with the .* regex to make it case insensitive?

另外,我需要正则表达式的其余部分不区分大小写。我需要使用什么修饰符来使用。*正则表达式使其不区分大小写?

2 个解决方案

#1


2  

The Hebrew vowels are well defined in Unicode: Table of Hebrew characters and Marks

希伯来元音在Unicode中得到很好的定义:希伯来字符和标记表

When you receive the input string from the user, you can insert the regular expression [\u05B0-\u05C4]* in between each character, and before and after the string. (The [] means match any of the included characters, and the * means match zero or more occurrences of the expression.) Then you can search the text block, using this as a regular expression. This expression allows you to find the exact string from the user's input. The user can also specify required vowels, which this expression would find.

当您从用户收到输入字符串时,可以在每个字符之间以及字符串之前和之后插入正则表达式[\ u05B0- \ u05C4] *。 ([]表示匹配任何包含的字符,*表示匹配零个或多个表达式。)然后,您可以搜索文本块,将其用作正则表达式。此表达式允许您从用户的输入中查找确切的字符串。用户还可以指定此表达式可以找到的所需元音。

I think that instead of trying to "ignore" the vowels, it would be easier to remove the vowels from both the large block of text and the user's string. Then you could search just the letters, as usual. This method would work if you don't need to display the vocalized text that the user found.

我认为不是试图“忽略”元音,而是从大块文本和用户字符串中删除元音会更容易。那么你可以照常搜索字母。如果您不需要显示用户找到的发声文本,则此方法可以使用。

#2


2  

This answer picks up where the question left off. Please read that for context.

这个答案可以解决问题所在。请阅读上下文。

As it turns out, iOS can make regular expressions case insensitive using an Objective-C modifier to NSPredicate. All that's left is to combine the two ranges. I realized that they are actually two consecutive ranges. My final code looks like this:

事实证明,iOS可以使用NSPredicate的Objective-C修饰符使正则表达式不区分大小写。剩下的就是将两个范围结合起来。我意识到它们实际上是两个连续的范围。我的最终代码如下所示:

NSInteger length = [searchString length];

NSString *vowelsAsRegex = @"[\u0591-\u05c4]?[\u0591-\u05c4]?"; //Cantillation: \u0591-\u05AF Vowels: \u05b0-\u05c

NSMutableString *modifiedSearchString = [searchString mutableCopy];

for (int i = length; i > 0; i--) {
    [modifiedSearchString insertString:vowelsAsRegex atIndex:i];
}

if (controller.searchBar.selectedScopeButtonIndex == 0) {
  predicate = [NSPredicate predicateWithFormat:@"articleTitle CONTAINS[cd] %@", modifiedSearchString];
}else if (controller.searchBar.selectedScopeButtonIndex == 1) {
    predicate = [NSPredicate predicateWithFormat:@"articleContent CONTAINS[c] %@", modifiedSearchString];            
}else if (controller.searchBar.selectedScopeButtonIndex == 2){
    predicate = [NSPredicate predicateWithFormat:@"ANY tags.tagText CONTAINS[c] %@", modifiedSearchString];
}else{
    predicate = [NSPredicate predicateWithFormat:@"(ANY tags.tagText CONTAINS[c] %@) OR (dvarTorahTitle CONTAINS[c] %@) OR (dvarTorahContent CONTAINS[c] %@)", modifiedSearchString,modifiedSearchString,modifiedSearchString];
}

[modifiedSearchString release];

for (Article *article in unfilteredResults) {
  if ([predicate evaluateWithObject:article]) {
    [self.filteredArray addObject:article];
  }          
}

Note that the range portion of the regular expression repeats itself. This is because there can be both a cantillation mark and a vowel on a single letter. Now, I can search uppercase and lowercase English, and Hebrew with or without vowels and cantillation marks.

请注意,正则表达式的范围部分会重复。这是因为单个字母上可以有一个旋转标记和一个元音。现在,我可以搜索大写和小写英语,以及希伯来语,有或没有元音和标记。

Awesome!

真棒!

#1


2  

The Hebrew vowels are well defined in Unicode: Table of Hebrew characters and Marks

希伯来元音在Unicode中得到很好的定义:希伯来字符和标记表

When you receive the input string from the user, you can insert the regular expression [\u05B0-\u05C4]* in between each character, and before and after the string. (The [] means match any of the included characters, and the * means match zero or more occurrences of the expression.) Then you can search the text block, using this as a regular expression. This expression allows you to find the exact string from the user's input. The user can also specify required vowels, which this expression would find.

当您从用户收到输入字符串时,可以在每个字符之间以及字符串之前和之后插入正则表达式[\ u05B0- \ u05C4] *。 ([]表示匹配任何包含的字符,*表示匹配零个或多个表达式。)然后,您可以搜索文本块,将其用作正则表达式。此表达式允许您从用户的输入中查找确切的字符串。用户还可以指定此表达式可以找到的所需元音。

I think that instead of trying to "ignore" the vowels, it would be easier to remove the vowels from both the large block of text and the user's string. Then you could search just the letters, as usual. This method would work if you don't need to display the vocalized text that the user found.

我认为不是试图“忽略”元音,而是从大块文本和用户字符串中删除元音会更容易。那么你可以照常搜索字母。如果您不需要显示用户找到的发声文本,则此方法可以使用。

#2


2  

This answer picks up where the question left off. Please read that for context.

这个答案可以解决问题所在。请阅读上下文。

As it turns out, iOS can make regular expressions case insensitive using an Objective-C modifier to NSPredicate. All that's left is to combine the two ranges. I realized that they are actually two consecutive ranges. My final code looks like this:

事实证明,iOS可以使用NSPredicate的Objective-C修饰符使正则表达式不区分大小写。剩下的就是将两个范围结合起来。我意识到它们实际上是两个连续的范围。我的最终代码如下所示:

NSInteger length = [searchString length];

NSString *vowelsAsRegex = @"[\u0591-\u05c4]?[\u0591-\u05c4]?"; //Cantillation: \u0591-\u05AF Vowels: \u05b0-\u05c

NSMutableString *modifiedSearchString = [searchString mutableCopy];

for (int i = length; i > 0; i--) {
    [modifiedSearchString insertString:vowelsAsRegex atIndex:i];
}

if (controller.searchBar.selectedScopeButtonIndex == 0) {
  predicate = [NSPredicate predicateWithFormat:@"articleTitle CONTAINS[cd] %@", modifiedSearchString];
}else if (controller.searchBar.selectedScopeButtonIndex == 1) {
    predicate = [NSPredicate predicateWithFormat:@"articleContent CONTAINS[c] %@", modifiedSearchString];            
}else if (controller.searchBar.selectedScopeButtonIndex == 2){
    predicate = [NSPredicate predicateWithFormat:@"ANY tags.tagText CONTAINS[c] %@", modifiedSearchString];
}else{
    predicate = [NSPredicate predicateWithFormat:@"(ANY tags.tagText CONTAINS[c] %@) OR (dvarTorahTitle CONTAINS[c] %@) OR (dvarTorahContent CONTAINS[c] %@)", modifiedSearchString,modifiedSearchString,modifiedSearchString];
}

[modifiedSearchString release];

for (Article *article in unfilteredResults) {
  if ([predicate evaluateWithObject:article]) {
    [self.filteredArray addObject:article];
  }          
}

Note that the range portion of the regular expression repeats itself. This is because there can be both a cantillation mark and a vowel on a single letter. Now, I can search uppercase and lowercase English, and Hebrew with or without vowels and cantillation marks.

请注意,正则表达式的范围部分会重复。这是因为单个字母上可以有一个旋转标记和一个元音。现在,我可以搜索大写和小写英语,以及希伯来语,有或没有元音和标记。

Awesome!

真棒!