I'm trying to run a randomForest on a large-ish data set (5000x300). Unfortunately I'm getting an error message as follows:
我试图在一个大数据集(5000x300)上运行一个随机森林。不幸的是,我收到了如下的错误信息:
> RF <- randomForest(prePrior1, postPrior1[,6]
+ ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE)
Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE, :
NA/NaN/Inf in foreign function call (arg 1)
So I try to find any NA's using :
所以我试图找到任何NA的用法:
> df2 <- prePrior1[is.na(prePrior1)]
> df2
character(0)
> df2 <- postPrior1[is.na(postPrior1[,6])]
> df2
numeric(0)
which leads me to believe that it's Inf's that are the problem as there don't seem to be any NA's.
这让我相信这是Inf的问题,因为似乎没有任何NA。
Any suggestions for how to root out Inf's?
对于如何根除Inf有什么建议吗?
5 个解决方案
#1
22
You're probably looking for is.finite
, though I'm not 100% certain that the problem is Infs in your input data.
你可能在找is。有限的,虽然我不是100%确定问题是输入数据中的信息。
Be sure to read the help for is.finite
carefully about which combinations of missing, infinite, etc. it picks out. Specifically, this:
请务必阅读帮助。对于它所选择的缺失、无限等的组合,要小心地加以限定。具体地说,这个:
> is.finite(c(1,NA,-Inf,NaN))
[1] TRUE FALSE FALSE FALSE
> is.infinite(c(1,NA,-Inf,NaN))
[1] FALSE FALSE TRUE FALSE
One of these things is not like the others. Not surprisingly, there's an is.nan
function as well.
其中之一与其他事物不同。毫不奇怪,有一个是。南函数。
#2
10
randomForest's 'NA/NaN/Inf in foreign function call' is often a false warning, and really irritating:
randomForest的'NA/NaN/Inf in foreign function call'通常是错误的警告,而且非常令人恼火:
- you will get this if any of the variables passed is character
- 如果其中任何一个变量是字符,就会得到这个。
- actual NaNs and Infs almost never happen in clean data
- 实际的NaNs和Infs几乎从不发生在干净的数据中
Fast and dirty trick to narrow things down, do a binary-search on your variable list, and use token parameters like ntree=2
to get an instant pass/fail on the subset of variables:
快速和肮脏的技巧来缩小范围,在你的变量列表上做一个二进制搜索,并且使用象ntree=2这样的令牌参数来获得对变量子集的即时传递/失败:
RF <- randomForest(prePrior1[m:n],ntree=2,...)
#3
4
In analogy to is.na
, you can use is.infinite
to find occurrences of infinites.
在类比。你可以用is。无限寻找无限的存在。
#4
2
Take a look at with
, e.g.:
看一下,例如:
> with(df, df == Inf)
foo bar baz abc ...
[1,] FALSE FALSE TRUE FALSE ...
[2,] FALSE TRUE FALSE FALSE ...
...
#5
1
joran's answer is what you want and informative. For more details about is.na()
and is.infinite()
, you should check out https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods.html and besides, after you get the logical vector which says whether each element of the original vector is NA/Inf, you can use the which()
function to get the indices, just like this:
乔兰的答案是你想要的和信息丰富的。有关is.na()和is.infinite()的详细信息,请参阅https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods。此外,在得到逻辑向量(即原始向量的每个元素是否为NA/Inf)后,可以使用which()函数获取索引,如下所示:
> v1 <- c(1, Inf, 2, NaN, Inf, 3, NaN, Inf)
> is.infinite(v1)
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
> which(is.infinite(v1))
[1] 2 5 8
> is.na(v1)
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
> which(is.na(v1))
[1] 4 7
the document for which()
is here https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html
()的文档在这里,https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html
#1
22
You're probably looking for is.finite
, though I'm not 100% certain that the problem is Infs in your input data.
你可能在找is。有限的,虽然我不是100%确定问题是输入数据中的信息。
Be sure to read the help for is.finite
carefully about which combinations of missing, infinite, etc. it picks out. Specifically, this:
请务必阅读帮助。对于它所选择的缺失、无限等的组合,要小心地加以限定。具体地说,这个:
> is.finite(c(1,NA,-Inf,NaN))
[1] TRUE FALSE FALSE FALSE
> is.infinite(c(1,NA,-Inf,NaN))
[1] FALSE FALSE TRUE FALSE
One of these things is not like the others. Not surprisingly, there's an is.nan
function as well.
其中之一与其他事物不同。毫不奇怪,有一个是。南函数。
#2
10
randomForest's 'NA/NaN/Inf in foreign function call' is often a false warning, and really irritating:
randomForest的'NA/NaN/Inf in foreign function call'通常是错误的警告,而且非常令人恼火:
- you will get this if any of the variables passed is character
- 如果其中任何一个变量是字符,就会得到这个。
- actual NaNs and Infs almost never happen in clean data
- 实际的NaNs和Infs几乎从不发生在干净的数据中
Fast and dirty trick to narrow things down, do a binary-search on your variable list, and use token parameters like ntree=2
to get an instant pass/fail on the subset of variables:
快速和肮脏的技巧来缩小范围,在你的变量列表上做一个二进制搜索,并且使用象ntree=2这样的令牌参数来获得对变量子集的即时传递/失败:
RF <- randomForest(prePrior1[m:n],ntree=2,...)
#3
4
In analogy to is.na
, you can use is.infinite
to find occurrences of infinites.
在类比。你可以用is。无限寻找无限的存在。
#4
2
Take a look at with
, e.g.:
看一下,例如:
> with(df, df == Inf)
foo bar baz abc ...
[1,] FALSE FALSE TRUE FALSE ...
[2,] FALSE TRUE FALSE FALSE ...
...
#5
1
joran's answer is what you want and informative. For more details about is.na()
and is.infinite()
, you should check out https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods.html and besides, after you get the logical vector which says whether each element of the original vector is NA/Inf, you can use the which()
function to get the indices, just like this:
乔兰的答案是你想要的和信息丰富的。有关is.na()和is.infinite()的详细信息,请参阅https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods。此外,在得到逻辑向量(即原始向量的每个元素是否为NA/Inf)后,可以使用which()函数获取索引,如下所示:
> v1 <- c(1, Inf, 2, NaN, Inf, 3, NaN, Inf)
> is.infinite(v1)
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
> which(is.infinite(v1))
[1] 2 5 8
> is.na(v1)
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
> which(is.na(v1))
[1] 4 7
the document for which()
is here https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html
()的文档在这里,https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html