如何从Objective-C中的NSString获取前N个单词？

What's the simplest way, given a string:

给出一个字符串,最简单的方法是什么:

NSString *str = @"Some really really long string is here and I just want the first 10 words, for example";

to result in an NSString with the first N (e.g., 10) words?

导致带有前N个(例如10个)单词的NSString?

EDIT: I'd also like to make sure it doesn't fail if the str is shorter than N.

编辑:如果str比N短,我还想确保它不会失败。

4 个解决方案

#1

If the words are space-separated:

如果单词是空格分隔的:

NSInteger nWords = 10;
NSRange wordRange = NSMakeRange(0, nWords);
NSArray *firstWords = [[str componentsSeparatedByString:@" "] subarrayWithRange:wordRange];

if you want to break on all whitespace:

如果你想打破所有空白:

NSCharacterSet *delimiterCharacterSet = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSArray *firstWords = [[str componentsSeparatedByCharactersInSet:delimiterCharacterSet] subarrayWithRange:wordRange];

Then,

NSString *result = [firstWords componentsJoinedByString:@" "];

#2

While Barry Wark's code works well for English, it is not the preferred way to detect word breaks. Many languages, such as Chinese and Japanese, do not separate words using spaces. And German, for example, has many compounds that are difficult to separate correctly.

虽然Barry Wark的代码适用于英语,但它不是检测单词分词的首选方法。许多语言(例如中文和日文)不使用空格分隔单词。例如,德国有许多难以正确分离的化合物。

What you want to use is CFStringTokenizer:

你想要使用的是CFStringTokenizer:

CFStringRef string; // Get string from somewhere
CFLocaleRef locale = CFLocaleCopyCurrent();

CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, string, CFRangeMake(0, CFStringGetLength(string)), kCFStringTokenizerUnitWord, locale);

CFStringTokenizerTokenType tokenType = kCFStringTokenizerTokenNone;
unsigned tokensFound = 0, desiredTokens = 10; // or the desired number of tokens

while(kCFStringTokenizerTokenNone != (tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer)) && tokensFound < desiredTokens) {
  CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer);
  CFStringRef tokenValue = CFStringCreateWithSubstring(kCFAllocatorDefault, string, tokenRange);

  // Do something with the token
  CFShow(tokenValue);

  CFRelease(tokenValue);

  ++tokensFound;
}

// Clean up
CFRelease(tokenizer);
CFRelease(locale);

#3

Based on Barry's answer, I wrote a function for the sake of this page (still giving him credit on SO)

基于巴里的回答,我为了这个页面写了一个函数(仍然给予他信任)

+ (NSString*)firstWords:(NSString*)theStr howMany:(NSInteger)maxWords {

    NSArray *theWords = [theStr componentsSeparatedByString:@" "];
    if ([theWords count] < maxWords) {
        maxWords = [theWords count];
    }
    NSRange wordRange = NSMakeRange(0, maxWords - 1);
    NSArray *firstWords = [theWords subarrayWithRange:wordRange];       
    return [firstWords componentsJoinedByString:@" "];
}

#4

Here's my solution, derived from the answers given here, for my own problem of removing the first word from a string...

这是我的解决方案,源自这里给出的答案,我自己的问题是从字符串中删除第一个单词...

NSMutableArray *words = [NSMutableArray arrayWithArray:[lowerString componentsSeparatedByString:@" "]];
[words removeObjectAtIndex:0];
return [words componentsJoinedByString:@" "];

#1