iOS - 在字符串中查找单词出现次数的最有效方法

时间:2022-09-25 18:00:20

Given a string, I need to obtain a count of each word that appears in that string. To do so, I extracted the string into an array, by word, and searched that way, but I have the feeling that searching the string directly is more optimal. Below is the code that I originally wrote to solve the problem. I'm up for suggestions on better solutions though.


NSMutableDictionary *sets = [[NSMutableDictionary alloc] init];

NSString *paragraph = [[NSString alloc] initWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"text" ofType:@"txt"] encoding:NSUTF8StringEncoding error:NULL];

NSMutableArray *words = [[[paragraph lowercaseString] componentsSeparatedByString:@" "] mutableCopy];

while (words.count) {
    NSMutableIndexSet *indexSet = [[NSMutableIndexSet alloc] init];
    NSString *search = [words objectAtIndex:0];
    for (unsigned i = 0; i < words.count; i++) {
        if ([[words objectAtIndex:i] isEqualToString:search]) {
            [indexSet addIndex:i];
    [sets setObject:[NSNumber numberWithInt:indexSet.count] forKey:search];
    [words removeObjectsAtIndexes:indexSet];

NSLog(@"%@", sets);



Starting string:
"This is a test. This is only a test."




  • "This" - 2
  • “这个” - 2
  • "is" - 2
  • “是” - 2
  • "a" - 2
  • “a2
  • "test" - 2
  • “测试” - 2
  • "only" - 1
  • “只有1个

3 个解决方案



This is exactly what an NSCountedSet is for.


You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:


NSString     *string     = @"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];

[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
                           options:NSStringEnumerationByWords | NSStringEnumerationLocalized
                        usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){

                            // This block is called once for each word in the string.
                            [countedSet addObject:substring];

                            // If you want to ignore case, so that "this" and "This" 
                            // are counted the same, use this line instead to convert
                            // each word to lowercase first:
                            // [countedSet addObject:[substring lowercaseString]];

NSLog(@"%@", countedSet);

// Results:  2012-11-13 14:01:10.567 Testing App[35767:fb03] 
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])



If I had to guess, I would say NSRegularExpression for that. Like this:


NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
                                                      range:NSMakeRange(0, [string length])];

That snippet was taken from here.


Edit 1.0:


Based on what Sir Till said:


NSString *string = @"This is a test, so it is a test";

NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
    if ([dictionary objectForKey:word])
        NSNumber *numberOfOccurences = [dictionary objectForKey:word];
        NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
        [dictionary setValue:increment forKey:word];
        [dictionary setValue:[NSNumber numberWithInt:1] forKey:word];

You should be careful with:


  • Punctuation signs. (near other words)
  • 标点符号。 (附近的其他词)
  • UpperCase words vs lowerCase words.
  • UpperCase单词与lowerCase单词。



I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString




This is exactly what an NSCountedSet is for.


You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:


NSString     *string     = @"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];

[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
                           options:NSStringEnumerationByWords | NSStringEnumerationLocalized
                        usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){

                            // This block is called once for each word in the string.
                            [countedSet addObject:substring];

                            // If you want to ignore case, so that "this" and "This" 
                            // are counted the same, use this line instead to convert
                            // each word to lowercase first:
                            // [countedSet addObject:[substring lowercaseString]];

NSLog(@"%@", countedSet);

// Results:  2012-11-13 14:01:10.567 Testing App[35767:fb03] 
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])



If I had to guess, I would say NSRegularExpression for that. Like this:


NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
                                                      range:NSMakeRange(0, [string length])];

That snippet was taken from here.


Edit 1.0:


Based on what Sir Till said:


NSString *string = @"This is a test, so it is a test";

NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
    if ([dictionary objectForKey:word])
        NSNumber *numberOfOccurences = [dictionary objectForKey:word];
        NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
        [dictionary setValue:increment forKey:word];
        [dictionary setValue:[NSNumber numberWithInt:1] forKey:word];

You should be careful with:


  • Punctuation signs. (near other words)
  • 标点符号。 (附近的其他词)
  • UpperCase words vs lowerCase words.
  • UpperCase单词与lowerCase单词。



I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString
