How can plural nouns be converted into singular nouns using R? I use the the tagPOS function which tags each text and then extract all of plural nouns which were tagged as "NNS". But what to do in case I want to convert those plural nouns into singular ones.?
复数名词如何用R转换成单数名词?我使用tagPOS函数标记每个文本,然后提取所有标记为“NNS”的复数名词。但是,如果我想将这些复数名词转换成单数名词,该怎么做。
library("openNLP")
library("tm")
acq_o <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipelines and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."
acq = tm_map(Corpus(DataframeSource(data.frame(acq_o))), removePunctuation)
acqTag <- tagPOS(acq)
acqTagSplit = strsplit(acqTag," ")
qq = 0
tag = 0
for (i in 1:length(acqTagSplit[[1]])){
qq[i] <-strsplit(acqTagSplit[[1]][i],'/')
tag[i] = qq[i][[1]][2]
}
index = 0
k = 0
for (i in 1:(length(acqTagSplit[[1]]))) {
if (tag[i] == "NNS"){
k = k +1
index[k] = i
}
}
index
1 个解决方案
#1
6
I'm sure you could pipe your data through an external program, or pre-process your data with it.
我确信您可以通过外部程序管理数据,或者使用它预处理数据。
If you're doing tagging anyway, the German project TreeTagger does a nice job of tagging and lemmatising at the same time.
如果您正在进行标记,那么德国项目TreeTagger可以同时完成标记和lemmatising。
EDIT: tchrist was right to remind me that, whatever your purposes, if you're actually looking for the singular surface forms of your plural nouns, going for a home-baked solution isn't going to cut it at all.
编辑:tchrist是正确的提醒我,无论你的目的是什么,如果你真的在寻找复数名词的单一表面形式,那么寻找一个自制的解决方案根本不会削减它。
And if you don't then Neo_Me (again, in the comments) seems to have found a package that does stemming in R: the package snowball (RStem seems to have been discontinued. AFAICT, Snowball replaces it.)
如果你不这样做,那么Neo_Me(再次,在评论中)似乎找到了一个源自R的包:包雪球(RStem似乎已经停止.AFAICT,Snowball取代它。)
This is just an implementation or wrapper around the Porter stemmer, of course. Use at your own risk, it is going to stem stuff like wives into wif or something like that.
当然,这只是Porter stemmer的实现或包装。使用风险自负,它会像妻子一样闯入wif之类的东西。
It just occurred to me, that R has CRAN. Looking for "lemma" there made me aware of the Java-dependent package wordnet. It seems to have a getLemma
function. The whole package is likely overkill for you, but might still get you somewhere if you can't find anything better.
我刚刚想到,R有CRAN。寻找“引理”让我意识到依赖于Java的包wordnet。它似乎有一个getLemma函数。整个包装对你来说可能有点过头了,但是如果你找不到更好的东西,可能还会让你到处找。
#1
6
I'm sure you could pipe your data through an external program, or pre-process your data with it.
我确信您可以通过外部程序管理数据,或者使用它预处理数据。
If you're doing tagging anyway, the German project TreeTagger does a nice job of tagging and lemmatising at the same time.
如果您正在进行标记,那么德国项目TreeTagger可以同时完成标记和lemmatising。
EDIT: tchrist was right to remind me that, whatever your purposes, if you're actually looking for the singular surface forms of your plural nouns, going for a home-baked solution isn't going to cut it at all.
编辑:tchrist是正确的提醒我,无论你的目的是什么,如果你真的在寻找复数名词的单一表面形式,那么寻找一个自制的解决方案根本不会削减它。
And if you don't then Neo_Me (again, in the comments) seems to have found a package that does stemming in R: the package snowball (RStem seems to have been discontinued. AFAICT, Snowball replaces it.)
如果你不这样做,那么Neo_Me(再次,在评论中)似乎找到了一个源自R的包:包雪球(RStem似乎已经停止.AFAICT,Snowball取代它。)
This is just an implementation or wrapper around the Porter stemmer, of course. Use at your own risk, it is going to stem stuff like wives into wif or something like that.
当然,这只是Porter stemmer的实现或包装。使用风险自负,它会像妻子一样闯入wif之类的东西。
It just occurred to me, that R has CRAN. Looking for "lemma" there made me aware of the Java-dependent package wordnet. It seems to have a getLemma
function. The whole package is likely overkill for you, but might still get you somewhere if you can't find anything better.
我刚刚想到,R有CRAN。寻找“引理”让我意识到依赖于Java的包wordnet。它似乎有一个getLemma函数。整个包装对你来说可能有点过头了,但是如果你找不到更好的东西,可能还会让你到处找。