Given a string, I need to obtain a count of each word that appears in that string. To do so, I extracted the string into an array, by word, and searched that way, but I have the feeling that searching the string directly is more optimal. Below is the code that I originally wrote to solve the problem. I'm up for suggestions on better solutions though.
给定一个字符串,我需要获得该字符串中出现的每个单词的计数。为此,我通过单词将字符串提取为数组,然后以这种方式搜索,但我觉得直接搜索字符串更为理想。下面是我最初编写的用于解决问题的代码。我想要更好的解决方案的建议。
NSMutableDictionary *sets = [[NSMutableDictionary alloc] init];
NSString *paragraph = [[NSString alloc] initWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"text" ofType:@"txt"] encoding:NSUTF8StringEncoding error:NULL];
NSMutableArray *words = [[[paragraph lowercaseString] componentsSeparatedByString:@" "] mutableCopy];
while (words.count) {
NSMutableIndexSet *indexSet = [[NSMutableIndexSet alloc] init];
NSString *search = [words objectAtIndex:0];
for (unsigned i = 0; i < words.count; i++) {
if ([[words objectAtIndex:i] isEqualToString:search]) {
[indexSet addIndex:i];
}
}
[sets setObject:[NSNumber numberWithInt:indexSet.count] forKey:search];
[words removeObjectsAtIndexes:indexSet];
}
NSLog(@"%@", sets);
Example:
例:
Starting string:
"This is a test. This is only a test."
起始字符串:“这是一个测试。这只是一个测试。”
Results:
结果:
- "This" - 2
- “这个” - 2
- "is" - 2
- “是” - 2
- "a" - 2
- “a2
- "test" - 2
- “测试” - 2
- "only" - 1
- “只有1个
3 个解决方案
#1
24
This is exactly what an NSCountedSet
is for.
这正是NSCountedSet的用途。
You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:
你需要将字符串拆分成单词(iOS足够好以便为我们提供一个函数,以便我们不必担心标点符号)并将它们中的每一个添加到计数集中,这会跟踪数字每个对象出现在集合中的次数:
NSString *string = @"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
// This block is called once for each word in the string.
[countedSet addObject:substring];
// If you want to ignore case, so that "this" and "This"
// are counted the same, use this line instead to convert
// each word to lowercase first:
// [countedSet addObject:[substring lowercaseString]];
}];
NSLog(@"%@", countedSet);
// Results: 2012-11-13 14:01:10.567 Testing App[35767:fb03]
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])
#2
2
If I had to guess, I would say NSRegularExpression
for that. Like this:
如果我不得不猜测,我会说NSRegularExpression。喜欢这个:
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
That snippet was taken from here.
那个片段是从这里拿走的。
Edit 1.0:
编辑1.0:
Based on what Sir Till said:
根据蒂尔爵士的说法:
NSString *string = @"This is a test, so it is a test";
NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
{
if ([dictionary objectForKey:word])
{
NSNumber *numberOfOccurences = [dictionary objectForKey:word];
NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
[dictionary setValue:increment forKey:word];
}
else
{
[dictionary setValue:[NSNumber numberWithInt:1] forKey:word];
}
}
You should be careful with:
你应该小心:
- Punctuation signs. (near other words)
- 标点符号。 (附近的其他词)
- UpperCase words vs lowerCase words.
- UpperCase单词与lowerCase单词。
#3
1
I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString
我认为你试图用循环搜索长段落中的单词是非常糟糕的。你应该使用正则表达式来做到这一点!我知道第一次学习它并不容易,但真的值得知道!看一下这种情况使用正则表达式在NSString中查找/替换子字符串
#1
24
This is exactly what an NSCountedSet
is for.
这正是NSCountedSet的用途。
You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:
你需要将字符串拆分成单词(iOS足够好以便为我们提供一个函数,以便我们不必担心标点符号)并将它们中的每一个添加到计数集中,这会跟踪数字每个对象出现在集合中的次数:
NSString *string = @"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
// This block is called once for each word in the string.
[countedSet addObject:substring];
// If you want to ignore case, so that "this" and "This"
// are counted the same, use this line instead to convert
// each word to lowercase first:
// [countedSet addObject:[substring lowercaseString]];
}];
NSLog(@"%@", countedSet);
// Results: 2012-11-13 14:01:10.567 Testing App[35767:fb03]
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])
#2
2
If I had to guess, I would say NSRegularExpression
for that. Like this:
如果我不得不猜测,我会说NSRegularExpression。喜欢这个:
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
That snippet was taken from here.
那个片段是从这里拿走的。
Edit 1.0:
编辑1.0:
Based on what Sir Till said:
根据蒂尔爵士的说法:
NSString *string = @"This is a test, so it is a test";
NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
{
if ([dictionary objectForKey:word])
{
NSNumber *numberOfOccurences = [dictionary objectForKey:word];
NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
[dictionary setValue:increment forKey:word];
}
else
{
[dictionary setValue:[NSNumber numberWithInt:1] forKey:word];
}
}
You should be careful with:
你应该小心:
- Punctuation signs. (near other words)
- 标点符号。 (附近的其他词)
- UpperCase words vs lowerCase words.
- UpperCase单词与lowerCase单词。
#3
1
I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString
我认为你试图用循环搜索长段落中的单词是非常糟糕的。你应该使用正则表达式来做到这一点!我知道第一次学习它并不容易,但真的值得知道!看一下这种情况使用正则表达式在NSString中查找/替换子字符串