tm_map具有并行::mclapply错误发生在Mac上的R 3.0.1中

时间:2021-02-02 13:50:32

I am using R 3.0.1 on Platform: x86_64-apple-darwin10.8.0 (64-bit)

我在平台上使用R 3.0.1: x86_64-苹果-达尔文10.8.0(64位)

I am trying to use tm_map from the tm library. But when I execute the this code


tm_map(crude, stemDocument)

I get this error:


Warning message:
In parallel::mclapply(x, FUN, ...) :
  all scheduled cores encountered errors in user code

Does anyone know a solution for this?


7 个解决方案



I suspect you don't have the SnowballC package installed, which seems to be required. tm_map is supposed to run stemDocument on all the documents using mclapply. Try just running the stemDocument function on one document, so you can extract the error:



For me, I got an error:


Error in loadNamespace(name) : there is no package called ‘SnowballC’

So I just went ahead and installed SnowballC and it worked. Clearly, SnowballC should be a dependency.




I just ran into this. It took me a bit of digging but I found out what was happening.


  1. I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'

    我有一行代码“rdevel <- tm_map(rdevel, asPlainTextDocument)”

  2. Running this produced the error


    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

  1. It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type
  2. 事实证明,‘tm_map’调用了‘parallel’中的一些代码,它试图计算出你有多少个核。看看它在想什么,打字。

    > getOption("mc.cores", 2L)
    [1] 2

  1. Aha moment! Tell the 'tm_map' call to only use one core!
  2. 开心的时刻!告诉“tm_map”调用只使用一个内核!

    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1)
    Error in : object 'asPlainTextDocument' not found
    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4)
    Warning message:
    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!


So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.

因此——如果您得到这个错误,将'mc.core =1'添加到'tm_map'调用并再次运行它。



I found an answer to this that was successful for me in this question: Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE to be explicitly defined.

我在这个问题中找到了一个成功的答案:Charles Copley在他的回答中指出,他认为新的tm包需要lazy = TRUE来明确定义。

So, your code would look like this


tm_map(crude, stemDocument, lazy = TRUE)

I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.




I have been facing same issue but finally got it fixed. My guess is that if I name the corpus as "longName" or "companyNewsCorpus", I get the issue but if I use corpus value as "a", it works well. Really weird.


Below code gives same error message mentioned in this thread


companyNewsCorpus  <-Corpus(DirSource("SourceDirectory"),
                            readerControl = list(language="english"))
companyNewsCorpus <- tm_map(companyNewsCorpus, 
                            removeWords, stopwords("english")) 

But if I convert this in below, it works without issues.


a  <-Corpus(DirSource("SourceDirectory"), 
            readerControl = list(language="english"))
a <- tm_map(a, removeWords, stopwords("english")) 



I ran into the same problem in tm using an Intel quad core I7 running on Mac OS X 10.10.5, and got the following warning:

我在tm遇到了同样的问题,使用运行在Mac OS X 10.10.5上的英特尔四核I7,得到以下警告:

In mclapply(content(x), FUN, ...) scheduled core 1 encountered error in user code, all values of the job will be affected

在mclapply(content(x), FUN,…)计划的core 1在用户代码中遇到错误时,作业的所有值都会受到影响

I was creating a corpus after downloading Twitter data.


Charles Copley's solution worked for me as well. I used: tm_map(*filename*, stemDocument, lazy = TRUE) after creating my corpus and then tm worked correctly.

查尔斯·科普利(Charles Copley)的解决方案对我也起了作用。我使用:tm_map(*filename*, stemDocument, lazy = TRUE)创建了我的语料库,然后tm正常工作。



I also ran into this same issue while using the tm library's removeWords function. Some of the other answers such as setting the number of cores to 1 did work for removing the set of English stop words, however I wanted to also remove a custom list of first names and surnames from my corpus, and these lists were upwards of 100,000 words long each.


None of the other suggestions would help this issue and it turns out that through some trial and error that removeWords seemed to have a limitation of 1000 words in a vector. So to I wrote this function that solved the issue for me:


# Let x be a corpus
# Let y be a vector containing words to remove
removeManyWords <- function (x, y) {

      n <- ceiling(length(y)/1000)
      s <- 1
      e <- 1000

      for (i in 1:n) {

            x <- tm_map(x, content_transformer(removeWords), y[s:e])
            s <- s + 1000
            e <- e + 1000




This function essentially counts how many words are in the vector of words I want to remove, and then divides it by 1000 and rounds up to the nearest whole number, n. We then loop through the vector of words to remove n times. With this method I didn't need to use lazy = TRUE or change the number of cores to use as can be seen from the actual removeWords call in the function. Hope this helps!

这个函数本质上是计算我想要删除的单词向量中有多少个单词,然后除以1000,最后四舍五入到最接近的整数n,然后循环遍历单词向量,删除n次。使用这个方法,我不需要使用lazy = TRUE或更改要使用的内核数量,从函数中的实际removeWords调用可以看到。希望这可以帮助!



I was working on Twitter data and got the same error in the original question while I was trying to convert all text to lower with tm_map() function


Warning message: In parallel::mclapply(x, FUN, ...) :   
all scheduled cores encountered errors in user code

Installing and loading package SnowballC resolved the problem completely. Hope this helps.




I suspect you don't have the SnowballC package installed, which seems to be required. tm_map is supposed to run stemDocument on all the documents using mclapply. Try just running the stemDocument function on one document, so you can extract the error:



For me, I got an error:


Error in loadNamespace(name) : there is no package called ‘SnowballC’

So I just went ahead and installed SnowballC and it worked. Clearly, SnowballC should be a dependency.




I just ran into this. It took me a bit of digging but I found out what was happening.


  1. I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'

    我有一行代码“rdevel <- tm_map(rdevel, asPlainTextDocument)”

  2. Running this produced the error


    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

  1. It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type
  2. 事实证明,‘tm_map’调用了‘parallel’中的一些代码,它试图计算出你有多少个核。看看它在想什么,打字。

    > getOption("mc.cores", 2L)
    [1] 2

  1. Aha moment! Tell the 'tm_map' call to only use one core!
  2. 开心的时刻!告诉“tm_map”调用只使用一个内核!

    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1)
    Error in : object 'asPlainTextDocument' not found
    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4)
    Warning message:
    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!


So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.

因此——如果您得到这个错误,将'mc.core =1'添加到'tm_map'调用并再次运行它。



I found an answer to this that was successful for me in this question: Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE to be explicitly defined.

我在这个问题中找到了一个成功的答案:Charles Copley在他的回答中指出,他认为新的tm包需要lazy = TRUE来明确定义。

So, your code would look like this


tm_map(crude, stemDocument, lazy = TRUE)

I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.




I have been facing same issue but finally got it fixed. My guess is that if I name the corpus as "longName" or "companyNewsCorpus", I get the issue but if I use corpus value as "a", it works well. Really weird.


Below code gives same error message mentioned in this thread


companyNewsCorpus  <-Corpus(DirSource("SourceDirectory"),
                            readerControl = list(language="english"))
companyNewsCorpus <- tm_map(companyNewsCorpus, 
                            removeWords, stopwords("english")) 

But if I convert this in below, it works without issues.


a  <-Corpus(DirSource("SourceDirectory"), 
            readerControl = list(language="english"))
a <- tm_map(a, removeWords, stopwords("english")) 



I ran into the same problem in tm using an Intel quad core I7 running on Mac OS X 10.10.5, and got the following warning:

我在tm遇到了同样的问题,使用运行在Mac OS X 10.10.5上的英特尔四核I7,得到以下警告:

In mclapply(content(x), FUN, ...) scheduled core 1 encountered error in user code, all values of the job will be affected

在mclapply(content(x), FUN,…)计划的core 1在用户代码中遇到错误时,作业的所有值都会受到影响

I was creating a corpus after downloading Twitter data.


Charles Copley's solution worked for me as well. I used: tm_map(*filename*, stemDocument, lazy = TRUE) after creating my corpus and then tm worked correctly.

查尔斯·科普利(Charles Copley)的解决方案对我也起了作用。我使用:tm_map(*filename*, stemDocument, lazy = TRUE)创建了我的语料库,然后tm正常工作。



I also ran into this same issue while using the tm library's removeWords function. Some of the other answers such as setting the number of cores to 1 did work for removing the set of English stop words, however I wanted to also remove a custom list of first names and surnames from my corpus, and these lists were upwards of 100,000 words long each.


None of the other suggestions would help this issue and it turns out that through some trial and error that removeWords seemed to have a limitation of 1000 words in a vector. So to I wrote this function that solved the issue for me:


# Let x be a corpus
# Let y be a vector containing words to remove
removeManyWords <- function (x, y) {

      n <- ceiling(length(y)/1000)
      s <- 1
      e <- 1000

      for (i in 1:n) {

            x <- tm_map(x, content_transformer(removeWords), y[s:e])
            s <- s + 1000
            e <- e + 1000




This function essentially counts how many words are in the vector of words I want to remove, and then divides it by 1000 and rounds up to the nearest whole number, n. We then loop through the vector of words to remove n times. With this method I didn't need to use lazy = TRUE or change the number of cores to use as can be seen from the actual removeWords call in the function. Hope this helps!

这个函数本质上是计算我想要删除的单词向量中有多少个单词,然后除以1000,最后四舍五入到最接近的整数n,然后循环遍历单词向量,删除n次。使用这个方法,我不需要使用lazy = TRUE或更改要使用的内核数量,从函数中的实际removeWords调用可以看到。希望这可以帮助!



I was working on Twitter data and got the same error in the original question while I was trying to convert all text to lower with tm_map() function


Warning message: In parallel::mclapply(x, FUN, ...) :   
all scheduled cores encountered errors in user code

Installing and loading package SnowballC resolved the problem completely. Hope this helps.
