I am using R 3.0.1 on Platform: x86_64-apple-darwin10.8.0 (64-bit)
我在平台上使用R 3.0.1: x86_64-苹果-达尔文10.8.0(64位)
I am trying to use tm_map from the tm library. But when I execute the this code
我正在尝试使用来自tm库的tm_map。但是当我执行这个代码时
library(tm)
data('crude')
tm_map(crude, stemDocument)
I get this error:
我得到这个错误:
Warning message:
In parallel::mclapply(x, FUN, ...) :
all scheduled cores encountered errors in user code
Does anyone know a solution for this?
有人知道解决方法吗?
7 个解决方案
#1
29
I suspect you don't have the SnowballC
package installed, which seems to be required. tm_map
is supposed to run stemDocument
on all the documents using mclapply
. Try just running the stemDocument
function on one document, so you can extract the error:
我怀疑您没有安装SnowballC包,这似乎是必需的。tm_map应该使用mclapply在所有文档上运行stemDocument。尝试在一个文档上运行stemDocument函数,这样您就可以提取错误:
stemDocument(crude[[1]])
For me, I got an error:
对我来说,我犯了一个错误:
Error in loadNamespace(name) : there is no package called ‘SnowballC’
So I just went ahead and installed SnowballC
and it worked. Clearly, SnowballC
should be a dependency.
所以我就安装了SnowballC,它很好用。显然,雪球应该是一个依赖项。
#2
17
I just ran into this. It took me a bit of digging but I found out what was happening.
我刚遇到这个。我花了一点时间去挖掘,但我发现了正在发生的事情。
-
I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'
我有一行代码“rdevel <- tm_map(rdevel, asPlainTextDocument)”
-
Running this produced the error
运行它会产生错误
In parallel::mclapply(x, FUN, ...) : all scheduled cores encountered errors in user code
- It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type
- 事实证明,‘tm_map’调用了‘parallel’中的一些代码,它试图计算出你有多少个核。看看它在想什么,打字。
> getOption("mc.cores", 2L) [1] 2 >
- Aha moment! Tell the 'tm_map' call to only use one core!
- 开心的时刻!告诉“tm_map”调用只使用一个内核!
> rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1) Error in match.fun(FUN) : object 'asPlainTextDocument' not found > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4) Warning message: In parallel::mclapply(x, FUN, ...) : all scheduled cores encountered errors in user code >
So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!
所以…与多个核心相比,“parallel”只是告诉你每个核心都有一个错误,而不是给出错误信息。不帮助,平行!我忘了点——函数名应该是'as.PlainTextDocument'!
So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.
因此——如果您得到这个错误,将'mc.core =1'添加到'tm_map'调用并再次运行它。
#3
11
I found an answer to this that was successful for me in this question: Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE
to be explicitly defined.
我在这个问题中找到了一个成功的答案:Charles Copley在他的回答中指出,他认为新的tm包需要lazy = TRUE来明确定义。
So, your code would look like this
代码是这样的。
library(tm)
data('crude')
tm_map(crude, stemDocument, lazy = TRUE)
I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.
我也尝试了没有雪球的游戏,看看这两个答案是否结合在一起。它似乎并没有对结果产生任何影响。
#4
3
I have been facing same issue but finally got it fixed. My guess is that if I name the corpus as "longName" or "companyNewsCorpus", I get the issue but if I use corpus value as "a", it works well. Really weird.
我一直面临着同样的问题,但最终还是解决了。我的猜测是,如果我将语料库命名为“longName”或“companyNewsCorpus”,我就会有问题,但如果我将语料库值命名为“a”,那么它就会运行得很好。真的奇怪。
Below code gives same error message mentioned in this thread
下面的代码给出了这个线程中提到的相同的错误消息
companyNewsCorpus <-Corpus(DirSource("SourceDirectory"),
readerControl = list(language="english"))
companyNewsCorpus <- tm_map(companyNewsCorpus,
removeWords, stopwords("english"))
But if I convert this in below, it works without issues.
但是如果我把它转换到下面,它就可以正常工作了。
a <-Corpus(DirSource("SourceDirectory"),
readerControl = list(language="english"))
a <- tm_map(a, removeWords, stopwords("english"))
#5
3
I ran into the same problem in tm
using an Intel quad core I7 running on Mac OS X 10.10.5, and got the following warning:
我在tm遇到了同样的问题,使用运行在Mac OS X 10.10.5上的英特尔四核I7,得到以下警告:
In mclapply(content(x), FUN, ...) scheduled core 1 encountered error in user code, all values of the job will be affected
在mclapply(content(x), FUN,…)计划的core 1在用户代码中遇到错误时,作业的所有值都会受到影响
I was creating a corpus after downloading Twitter data.
我在下载Twitter数据后创建了一个语料库。
Charles Copley's solution worked for me as well. I used: tm_map(*filename*, stemDocument, lazy = TRUE)
after creating my corpus and then tm worked correctly.
查尔斯·科普利(Charles Copley)的解决方案对我也起了作用。我使用:tm_map(*filename*, stemDocument, lazy = TRUE)创建了我的语料库,然后tm正常工作。
#6
1
I also ran into this same issue while using the tm library's removeWords function. Some of the other answers such as setting the number of cores to 1 did work for removing the set of English stop words, however I wanted to also remove a custom list of first names and surnames from my corpus, and these lists were upwards of 100,000 words long each.
我在使用tm库的removeWords函数时也遇到了同样的问题。等一些其他的答案设置内核的数量1为删除的工作英语停止的话,不过我想也自定义名字和姓氏从列表中删除我的文集,这些列表被超过100000字。
None of the other suggestions would help this issue and it turns out that through some trial and error that removeWords seemed to have a limitation of 1000 words in a vector. So to I wrote this function that solved the issue for me:
没有任何其他的建议可以帮助这个问题,通过一些尝试和错误,删除的词在向量中似乎有1000个单词的限制。所以我写了这个函数为我解决了这个问题:
# Let x be a corpus
# Let y be a vector containing words to remove
removeManyWords <- function (x, y) {
n <- ceiling(length(y)/1000)
s <- 1
e <- 1000
for (i in 1:n) {
x <- tm_map(x, content_transformer(removeWords), y[s:e])
s <- s + 1000
e <- e + 1000
}
x
}
This function essentially counts how many words are in the vector of words I want to remove, and then divides it by 1000 and rounds up to the nearest whole number, n. We then loop through the vector of words to remove n times. With this method I didn't need to use lazy = TRUE or change the number of cores to use as can be seen from the actual removeWords call in the function. Hope this helps!
这个函数本质上是计算我想要删除的单词向量中有多少个单词,然后除以1000,最后四舍五入到最接近的整数n,然后循环遍历单词向量,删除n次。使用这个方法,我不需要使用lazy = TRUE或更改要使用的内核数量,从函数中的实际removeWords调用可以看到。希望这可以帮助!
#7
0
I was working on Twitter data and got the same error in the original question while I was trying to convert all text to lower with tm_map()
function
我正在处理Twitter数据,当我试图用tm_map()函数将所有文本转换为low时,我在最初的问题中得到了相同的错误
Warning message: In parallel::mclapply(x, FUN, ...) :
all scheduled cores encountered errors in user code
Installing and loading package SnowballC
resolved the problem completely. Hope this helps.
安装和加载程序包SnowballC完全解决了这个问题。希望这个有帮助。
#1
29
I suspect you don't have the SnowballC
package installed, which seems to be required. tm_map
is supposed to run stemDocument
on all the documents using mclapply
. Try just running the stemDocument
function on one document, so you can extract the error:
我怀疑您没有安装SnowballC包,这似乎是必需的。tm_map应该使用mclapply在所有文档上运行stemDocument。尝试在一个文档上运行stemDocument函数,这样您就可以提取错误:
stemDocument(crude[[1]])
For me, I got an error:
对我来说,我犯了一个错误:
Error in loadNamespace(name) : there is no package called ‘SnowballC’
So I just went ahead and installed SnowballC
and it worked. Clearly, SnowballC
should be a dependency.
所以我就安装了SnowballC,它很好用。显然,雪球应该是一个依赖项。
#2
17
I just ran into this. It took me a bit of digging but I found out what was happening.
我刚遇到这个。我花了一点时间去挖掘,但我发现了正在发生的事情。
-
I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'
我有一行代码“rdevel <- tm_map(rdevel, asPlainTextDocument)”
-
Running this produced the error
运行它会产生错误
In parallel::mclapply(x, FUN, ...) : all scheduled cores encountered errors in user code
- It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type
- 事实证明,‘tm_map’调用了‘parallel’中的一些代码,它试图计算出你有多少个核。看看它在想什么,打字。
> getOption("mc.cores", 2L) [1] 2 >
- Aha moment! Tell the 'tm_map' call to only use one core!
- 开心的时刻!告诉“tm_map”调用只使用一个内核!
> rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1) Error in match.fun(FUN) : object 'asPlainTextDocument' not found > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4) Warning message: In parallel::mclapply(x, FUN, ...) : all scheduled cores encountered errors in user code >
So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!
所以…与多个核心相比,“parallel”只是告诉你每个核心都有一个错误,而不是给出错误信息。不帮助,平行!我忘了点——函数名应该是'as.PlainTextDocument'!
So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.
因此——如果您得到这个错误,将'mc.core =1'添加到'tm_map'调用并再次运行它。
#3
11
I found an answer to this that was successful for me in this question: Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE
to be explicitly defined.
我在这个问题中找到了一个成功的答案:Charles Copley在他的回答中指出,他认为新的tm包需要lazy = TRUE来明确定义。
So, your code would look like this
代码是这样的。
library(tm)
data('crude')
tm_map(crude, stemDocument, lazy = TRUE)
I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.
我也尝试了没有雪球的游戏,看看这两个答案是否结合在一起。它似乎并没有对结果产生任何影响。
#4
3
I have been facing same issue but finally got it fixed. My guess is that if I name the corpus as "longName" or "companyNewsCorpus", I get the issue but if I use corpus value as "a", it works well. Really weird.
我一直面临着同样的问题,但最终还是解决了。我的猜测是,如果我将语料库命名为“longName”或“companyNewsCorpus”,我就会有问题,但如果我将语料库值命名为“a”,那么它就会运行得很好。真的奇怪。
Below code gives same error message mentioned in this thread
下面的代码给出了这个线程中提到的相同的错误消息
companyNewsCorpus <-Corpus(DirSource("SourceDirectory"),
readerControl = list(language="english"))
companyNewsCorpus <- tm_map(companyNewsCorpus,
removeWords, stopwords("english"))
But if I convert this in below, it works without issues.
但是如果我把它转换到下面,它就可以正常工作了。
a <-Corpus(DirSource("SourceDirectory"),
readerControl = list(language="english"))
a <- tm_map(a, removeWords, stopwords("english"))
#5
3
I ran into the same problem in tm
using an Intel quad core I7 running on Mac OS X 10.10.5, and got the following warning:
我在tm遇到了同样的问题,使用运行在Mac OS X 10.10.5上的英特尔四核I7,得到以下警告:
In mclapply(content(x), FUN, ...) scheduled core 1 encountered error in user code, all values of the job will be affected
在mclapply(content(x), FUN,…)计划的core 1在用户代码中遇到错误时,作业的所有值都会受到影响
I was creating a corpus after downloading Twitter data.
我在下载Twitter数据后创建了一个语料库。
Charles Copley's solution worked for me as well. I used: tm_map(*filename*, stemDocument, lazy = TRUE)
after creating my corpus and then tm worked correctly.
查尔斯·科普利(Charles Copley)的解决方案对我也起了作用。我使用:tm_map(*filename*, stemDocument, lazy = TRUE)创建了我的语料库,然后tm正常工作。
#6
1
I also ran into this same issue while using the tm library's removeWords function. Some of the other answers such as setting the number of cores to 1 did work for removing the set of English stop words, however I wanted to also remove a custom list of first names and surnames from my corpus, and these lists were upwards of 100,000 words long each.
我在使用tm库的removeWords函数时也遇到了同样的问题。等一些其他的答案设置内核的数量1为删除的工作英语停止的话,不过我想也自定义名字和姓氏从列表中删除我的文集,这些列表被超过100000字。
None of the other suggestions would help this issue and it turns out that through some trial and error that removeWords seemed to have a limitation of 1000 words in a vector. So to I wrote this function that solved the issue for me:
没有任何其他的建议可以帮助这个问题,通过一些尝试和错误,删除的词在向量中似乎有1000个单词的限制。所以我写了这个函数为我解决了这个问题:
# Let x be a corpus
# Let y be a vector containing words to remove
removeManyWords <- function (x, y) {
n <- ceiling(length(y)/1000)
s <- 1
e <- 1000
for (i in 1:n) {
x <- tm_map(x, content_transformer(removeWords), y[s:e])
s <- s + 1000
e <- e + 1000
}
x
}
This function essentially counts how many words are in the vector of words I want to remove, and then divides it by 1000 and rounds up to the nearest whole number, n. We then loop through the vector of words to remove n times. With this method I didn't need to use lazy = TRUE or change the number of cores to use as can be seen from the actual removeWords call in the function. Hope this helps!
这个函数本质上是计算我想要删除的单词向量中有多少个单词,然后除以1000,最后四舍五入到最接近的整数n,然后循环遍历单词向量,删除n次。使用这个方法,我不需要使用lazy = TRUE或更改要使用的内核数量,从函数中的实际removeWords调用可以看到。希望这可以帮助!
#7
0
I was working on Twitter data and got the same error in the original question while I was trying to convert all text to lower with tm_map()
function
我正在处理Twitter数据,当我试图用tm_map()函数将所有文本转换为low时,我在最初的问题中得到了相同的错误
Warning message: In parallel::mclapply(x, FUN, ...) :
all scheduled cores encountered errors in user code
Installing and loading package SnowballC
resolved the problem completely. Hope this helps.
安装和加载程序包SnowballC完全解决了这个问题。希望这个有帮助。