For instance, given the following strings:
例如,给定以下字符串:
let textEN = "The quick brown fox jumps over the lazy dog"
let textES = "El zorro marrón rápido salta sobre el perro perezoso"
let textAR = "الثعلب البني السريع يقفز فوق الكلب الكسول"
let textDE = "Der schnelle braune Fuchs springt über den faulen Hund"
I want to detect the used language in each of declared string.
我想在每个声明的字符串中检测使用的语言。
Let's assume the signature for the implemented function is:
假设实现函数的签名为:
func detectedLangauge<T: StringProtocol>(_ forString: T) -> String?
returns an Optional string in case of no detected language.
如果没有检测到的语言,返回可选字符串。
thus the appropriate result would be:
因此,适当的结果将是:
let englishDetectedLangauge = detectedLangauge(textEN) // => English
let spanishDetectedLangauge = detectedLangauge(textES) // => Spanish
let arabicDetectedLangauge = detectedLangauge(textAR) // => Arabic
let germanDetectedLangauge = detectedLangauge(textDE) // => German
Is there an easy approach to achieve it?
是否有一种简单的方法来实现它?
2 个解决方案
#1
9
Quick Answer:
Since iOS 11+, you could achieve it by using NSLinguisticTagger. Implementing desired function like this:
自从ios11 +以来,您可以通过使用NSLinguisticTagger来实现它。实现所需的功能如下:
func detectedLangauge<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLangauge = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLangauge
}
should achieve what are you asking for.
应该达到你所要求的。
Described Answer:
First of all, you should be aware of what are you asking about is mainly relates to the world of Natural language processing (NLP).
首先,你应该意识到你所问的主要与自然语言处理(NLP)有关。
Since NLP is more than text language detection, the rest of the answer will not contains specific NLP information.
由于NLP不仅仅是文本语言检测,其余的答案将不包含特定的NLP信息。
Obviously, implementing such a functionality is not that easy, especially when starting to care about the details of the process such as splitting into sentences and even into words, after that recognising names and punctuations etc... I bet you would think of "what a painful process! it is not even logical to do it by myself"; Fortunately, iOS does supports NLP (actually, NLP APIs are available for all Apple platforms, not only the iOS) to make what are you aiming for to be easy to be implemented. The core component that you would work with is NSLinguisticTagger
:
显然,实现这样的功能并不是那么容易,特别是当开始关注过程的细节时,比如拆分句子甚至单词,在识别出名字和标点等之后……我敢打赌你一定会想到“多么痛苦的过程!”甚至连我一个人做都不符合逻辑;幸运的是,iOS确实支持NLP(实际上,NLP api不仅适用于iOS,还适用于所有苹果平台),以使您的目标易于实现。您将使用的核心组件是NSLinguisticTagger:
Analyze natural language text to tag part of speech and lexical class, identify names, perform lemmatization, and determine the language and script.
分析自然语言文本,标记词类和词法类的一部分,识别名称,执行旅化,确定语言和脚本。
NSLinguisticTagger
provides a uniform interface to a variety of natural language processing functionality with support for many different languages and scripts. You can use this class to segment natural language text into paragraphs, sentences, or words, and tag information about those segments, such as part of speech, lexical class, lemma, script, and language.NSLinguisticTagger为各种自然语言处理功能提供了统一的接口,支持多种不同的语言和脚本。您可以使用这个类将自然语言文本分割成段落、句子或单词,并标记有关这些片段的信息,如词性、词汇类、引理、脚本和语言的一部分。
As mentioned in the class documentation, the method that you are looking for - under Determining the Dominant Language and Orthography section- is dominantLanguage(for:)
:
正如在课堂文档中提到的,你正在寻找的方法——在确定主导语言和拼写的部分——是dominantLanguage(for:):
Returns the dominant language for the specified string.
返回指定字符串的主导语言。
.
。
.
。
Return Value
返回值
The BCP-47 tag identifying the dominant language of the string, or the tag "und" if a specific language cannot be determined.
标识字符串的主导语言的BCP-47标记,如果无法确定特定语言,则标记“und”。
You might notice that the NSLinguisticTagger
is exist since back to iOS 5. However, dominantLanguage(for:)
method is only supported for iOS 11 and above, that's because it has been developed on top of the Core ML Framework:
您可能注意到,NSLinguisticTagger从iOS 5开始就存在。然而,dominantLanguage(for:)方法仅支持ios11及以上版本,这是因为它是在核心ML框架之上开发的:
. . .
。
Core ML is the foundation for domain-specific frameworks and functionality. Core ML supports Vision for image analysis, Foundation for natural language processing (for example, the
NSLinguisticTagger
class), and GameplayKit for evaluating learned decision trees. Core ML itself builds on top of low-level primitives like Accelerate and BNNS, as well as Metal Performance Shaders.核心ML是特定领域框架和功能的基础。核心ML支持图像分析的视觉、自然语言处理的基础(例如,NSLinguisticTagger类)和用于评估学习决策树的GameplayKit。核心ML本身构建在诸如加速和BNNS等底层原语之上,以及金属性能着色器之上。
Based on the returned value from calling dominantLanguage(for:)
by passing "The quick brown fox jumps over the lazy dog":
根据返回值,从调用dominantLanguage(for:)通过“敏捷的棕色狐狸跳过懒狗”:
NSLinguisticTagger.dominantLanguage(for: "The quick brown fox jumps over the lazy dog")
would be "en" optional string. However, so far that is not the desired output, the expectation is to get "English" instead! Well, that is exactly what you should get by calling the localizedString(forLanguageCode:)
method from Locale Structure and passing the gotten language code:
将是“en”可选字符串。然而,到目前为止,这并不是期望的输出,期望的是得到“英语”代替!这正是您应该从语言环境结构中调用localizedString(forLanguageCode:)方法并传递所获得的语言代码所得到的结果:
Locale.current.localizedString(forIdentifier: "en") // English
Putting all together:
As mentioned in the "Quick Answer" code snippet, the function would be:
如“快速回答”代码片段所述,函数为:
func detectedLangauge<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLangauge = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLangauge
}
Output:
输出:
It would be as expected:
它将如预期的那样:
let englishDetectedLangauge = detectedLangauge(textEN) // => English
let spanishDetectedLangauge = detectedLangauge(textES) // => Spanish
let arabicDetectedLangauge = detectedLangauge(textAR) // => Arabic
let germanDetectedLangauge = detectedLangauge(textDE) // => German
Note That:
注意:
There still cases for not getting a language name for a given string, like:
对于给定字符串,仍然存在不获取语言名称的情况,比如:
let textUND = "SdsOE"
let undefinedDetectedLanguage = detectedLangauge(textUND) // => Unknown language
Or it could be even nil
:
也可以是零:
let rabish = "000747322"
let rabishDetectedLanguage = detectedLangauge(rabish) // => nil
Still find it a not bad result for providing a useful output...
仍然发现提供有用的输出是一个不错的结果……
Furthermore:
About NSLinguisticTagger:
Although I will not going to dive deep in NSLinguisticTagger
usage, I would like to note that there are couple of really cool features exist in it more than just simply detecting the language for a given a text; As a pretty simple example: using the lemma when enumerating tags would be so helpful when working with Information retrieval, since you would be able to recognize the word "driving" passing "drive" word.
虽然我不会深入探讨NSLinguisticTagger的用法,但我想指出的是,除了简单地检测给定文本的语言外,它还有一些非常酷的特性;作为一个非常简单的例子:使用lemma时,当枚举标记在处理信息检索时非常有用,因为您将能够识别“驱动”传递“驱动”字的单词。
Official Resources
Apple Video Sessions:
苹果公司视频会议:
- For more about Natural Language Processing and how
NSLinguisticTagger
works: Natural Language Processing and your Apps. - 关于自然语言处理和NSLinguisticTagger作品的更多内容:自然语言处理和你的应用程序。
Also, for getting familiar with the CoreML:
同时,为了熟悉CoreML:
- Introducing Core ML.
- 引入核心毫升。
- Core ML in depth.
- 核心毫升深度。
#2
0
You can use NSLinguisticTagger's tagAt method. It support iOS 5 and later.
您可以使用nslinguistagger的tagAt方法。它支持iOS 5和更高版本。
func detectLanguage<T: StringProtocol>(for text: T) -> String? {
let tagger = NSLinguisticTagger.init(tagSchemes: [.language], options: 0)
tagger.string = String(text)
guard let languageCode = tagger.tag(at: 0, scheme: .language, tokenRange: nil, sentenceRange: nil) else { return nil }
return Locale.current.localizedString(forIdentifier: languageCode)
}
detectLanguage(for: "The quick brown fox jumps over the lazy dog") // English
detectLanguage(for: "El zorro marrón rápido salta sobre el perro perezoso") // Spanish
detectLanguage(for: "الثعلب البني السريع يقفز فوق الكلب الكسول") // Arabic
detectLanguage(for: "Der schnelle braune Fuchs springt über den faulen Hund") // German
#1
9
Quick Answer:
Since iOS 11+, you could achieve it by using NSLinguisticTagger. Implementing desired function like this:
自从ios11 +以来,您可以通过使用NSLinguisticTagger来实现它。实现所需的功能如下:
func detectedLangauge<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLangauge = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLangauge
}
should achieve what are you asking for.
应该达到你所要求的。
Described Answer:
First of all, you should be aware of what are you asking about is mainly relates to the world of Natural language processing (NLP).
首先,你应该意识到你所问的主要与自然语言处理(NLP)有关。
Since NLP is more than text language detection, the rest of the answer will not contains specific NLP information.
由于NLP不仅仅是文本语言检测,其余的答案将不包含特定的NLP信息。
Obviously, implementing such a functionality is not that easy, especially when starting to care about the details of the process such as splitting into sentences and even into words, after that recognising names and punctuations etc... I bet you would think of "what a painful process! it is not even logical to do it by myself"; Fortunately, iOS does supports NLP (actually, NLP APIs are available for all Apple platforms, not only the iOS) to make what are you aiming for to be easy to be implemented. The core component that you would work with is NSLinguisticTagger
:
显然,实现这样的功能并不是那么容易,特别是当开始关注过程的细节时,比如拆分句子甚至单词,在识别出名字和标点等之后……我敢打赌你一定会想到“多么痛苦的过程!”甚至连我一个人做都不符合逻辑;幸运的是,iOS确实支持NLP(实际上,NLP api不仅适用于iOS,还适用于所有苹果平台),以使您的目标易于实现。您将使用的核心组件是NSLinguisticTagger:
Analyze natural language text to tag part of speech and lexical class, identify names, perform lemmatization, and determine the language and script.
分析自然语言文本,标记词类和词法类的一部分,识别名称,执行旅化,确定语言和脚本。
NSLinguisticTagger
provides a uniform interface to a variety of natural language processing functionality with support for many different languages and scripts. You can use this class to segment natural language text into paragraphs, sentences, or words, and tag information about those segments, such as part of speech, lexical class, lemma, script, and language.NSLinguisticTagger为各种自然语言处理功能提供了统一的接口,支持多种不同的语言和脚本。您可以使用这个类将自然语言文本分割成段落、句子或单词,并标记有关这些片段的信息,如词性、词汇类、引理、脚本和语言的一部分。
As mentioned in the class documentation, the method that you are looking for - under Determining the Dominant Language and Orthography section- is dominantLanguage(for:)
:
正如在课堂文档中提到的,你正在寻找的方法——在确定主导语言和拼写的部分——是dominantLanguage(for:):
Returns the dominant language for the specified string.
返回指定字符串的主导语言。
.
。
.
。
Return Value
返回值
The BCP-47 tag identifying the dominant language of the string, or the tag "und" if a specific language cannot be determined.
标识字符串的主导语言的BCP-47标记,如果无法确定特定语言,则标记“und”。
You might notice that the NSLinguisticTagger
is exist since back to iOS 5. However, dominantLanguage(for:)
method is only supported for iOS 11 and above, that's because it has been developed on top of the Core ML Framework:
您可能注意到,NSLinguisticTagger从iOS 5开始就存在。然而,dominantLanguage(for:)方法仅支持ios11及以上版本,这是因为它是在核心ML框架之上开发的:
. . .
。
Core ML is the foundation for domain-specific frameworks and functionality. Core ML supports Vision for image analysis, Foundation for natural language processing (for example, the
NSLinguisticTagger
class), and GameplayKit for evaluating learned decision trees. Core ML itself builds on top of low-level primitives like Accelerate and BNNS, as well as Metal Performance Shaders.核心ML是特定领域框架和功能的基础。核心ML支持图像分析的视觉、自然语言处理的基础(例如,NSLinguisticTagger类)和用于评估学习决策树的GameplayKit。核心ML本身构建在诸如加速和BNNS等底层原语之上,以及金属性能着色器之上。
Based on the returned value from calling dominantLanguage(for:)
by passing "The quick brown fox jumps over the lazy dog":
根据返回值,从调用dominantLanguage(for:)通过“敏捷的棕色狐狸跳过懒狗”:
NSLinguisticTagger.dominantLanguage(for: "The quick brown fox jumps over the lazy dog")
would be "en" optional string. However, so far that is not the desired output, the expectation is to get "English" instead! Well, that is exactly what you should get by calling the localizedString(forLanguageCode:)
method from Locale Structure and passing the gotten language code:
将是“en”可选字符串。然而,到目前为止,这并不是期望的输出,期望的是得到“英语”代替!这正是您应该从语言环境结构中调用localizedString(forLanguageCode:)方法并传递所获得的语言代码所得到的结果:
Locale.current.localizedString(forIdentifier: "en") // English
Putting all together:
As mentioned in the "Quick Answer" code snippet, the function would be:
如“快速回答”代码片段所述,函数为:
func detectedLangauge<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLangauge = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLangauge
}
Output:
输出:
It would be as expected:
它将如预期的那样:
let englishDetectedLangauge = detectedLangauge(textEN) // => English
let spanishDetectedLangauge = detectedLangauge(textES) // => Spanish
let arabicDetectedLangauge = detectedLangauge(textAR) // => Arabic
let germanDetectedLangauge = detectedLangauge(textDE) // => German
Note That:
注意:
There still cases for not getting a language name for a given string, like:
对于给定字符串,仍然存在不获取语言名称的情况,比如:
let textUND = "SdsOE"
let undefinedDetectedLanguage = detectedLangauge(textUND) // => Unknown language
Or it could be even nil
:
也可以是零:
let rabish = "000747322"
let rabishDetectedLanguage = detectedLangauge(rabish) // => nil
Still find it a not bad result for providing a useful output...
仍然发现提供有用的输出是一个不错的结果……
Furthermore:
About NSLinguisticTagger:
Although I will not going to dive deep in NSLinguisticTagger
usage, I would like to note that there are couple of really cool features exist in it more than just simply detecting the language for a given a text; As a pretty simple example: using the lemma when enumerating tags would be so helpful when working with Information retrieval, since you would be able to recognize the word "driving" passing "drive" word.
虽然我不会深入探讨NSLinguisticTagger的用法,但我想指出的是,除了简单地检测给定文本的语言外,它还有一些非常酷的特性;作为一个非常简单的例子:使用lemma时,当枚举标记在处理信息检索时非常有用,因为您将能够识别“驱动”传递“驱动”字的单词。
Official Resources
Apple Video Sessions:
苹果公司视频会议:
- For more about Natural Language Processing and how
NSLinguisticTagger
works: Natural Language Processing and your Apps. - 关于自然语言处理和NSLinguisticTagger作品的更多内容:自然语言处理和你的应用程序。
Also, for getting familiar with the CoreML:
同时,为了熟悉CoreML:
- Introducing Core ML.
- 引入核心毫升。
- Core ML in depth.
- 核心毫升深度。
#2
0
You can use NSLinguisticTagger's tagAt method. It support iOS 5 and later.
您可以使用nslinguistagger的tagAt方法。它支持iOS 5和更高版本。
func detectLanguage<T: StringProtocol>(for text: T) -> String? {
let tagger = NSLinguisticTagger.init(tagSchemes: [.language], options: 0)
tagger.string = String(text)
guard let languageCode = tagger.tag(at: 0, scheme: .language, tokenRange: nil, sentenceRange: nil) else { return nil }
return Locale.current.localizedString(forIdentifier: languageCode)
}
detectLanguage(for: "The quick brown fox jumps over the lazy dog") // English
detectLanguage(for: "El zorro marrón rápido salta sobre el perro perezoso") // Spanish
detectLanguage(for: "الثعلب البني السريع يقفز فوق الكلب الكسول") // Arabic
detectLanguage(for: "Der schnelle braune Fuchs springt über den faulen Hund") // German