I am trying to step through a vector to find the outliers using IQR to calculate a range. When I run this script looking for values to the right of the IQR I get results and when I run to the left I get the error: missing value where TRUE/FALSE needed. How can I scrub out the true and false in my dataset? here is my script:
我正在尝试通过一个向量来找到使用IQR来计算一个范围的离群值。当我运行这个脚本寻找IQR的右边的值时,我得到了结果,当我运行到左边时,我得到了错误:缺少TRUE/FALSE所需的值。如何在数据集中删除真实和错误?这是我的脚本:
data = c(100, 120, 121, 123, 125, 124, 123, 123, 123, 124, 125, 167, 180, 123, 156)
Q3 <- quantile(data, 0.75) ##gets the third quantile from the list of vectors
Q1 <- quantile(data, 0.25) ## gets the first quantile from the list of vectors
outliers_left <-(Q1-1.5*IQR(data))
outliers_right <-(Q3+1.5*IQR(data))
IQR <- IQR(data)
paste("the innner quantile range is", IQR)
Q1 # quantil at 0.25
Q3 # quantile at 0.75
# show the range of numbers we have
paste("your range is", outliers_left, "through", outliers_right, "to determine outliers")
# count ho many vectors there are and then we will pass this value into a loop to look for
# anything above and below the Q1-Q3 values
vectorCount <- sum(!is.na(data))
i <- 1
while( i < vectorCount ){
i <- i + 1
x <- data[i]
# if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
if(x > outliers_right) {print(x)}
}
and the error I get is
我得到的误差是。
[1] 167
[1] 180
[1] 156
Error in if (x > outliers_right) { :
missing value where TRUE/FALSE needed
as you can see if you run this script, it is finding my 3 outliers on the right and also throws the error, but when I run this again on the left of my IQR, and I do have an outlier of 100 in the vector, I just get the error without other results being displayed. How can I fix this script? any help greatly appreciated. I've been scouring the web and my books for days on how to fix this.
你可以看到如果你运行这个脚本,这是找到我3离群值右边也抛出错误,但是当我再次运行这个左边我的差,我有一个离群值向量,100的我只是得到错误没有其他的结果被显示。我如何修复这个脚本?任何帮助深表感谢。我在网上和我的书里搜索了好几天,来解决这个问题。
2 个解决方案
#1
3
As noted in the comments, the error is due to the way you've constructed your while
loop. At the last iteration, i == 16
though there are only 15 elements to process. Changing from i <= vectorCount
to i < vectorCount
fixes the problem:
正如注释中所指出的,错误是由于您构建while循环的方式造成的。在最后一次迭代中,i == 16,但过程中只有15个元素。从i <= vectorCount改为i < vectorCount修复问题:
i <- 1
while( i < vectorCount ){
i <- i + 1
x <- data[i]
# if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
if(x > outliers_right) {print(x)}
}
#-----
[1] 167
[1] 180
[1] 156
However, this is really not how R works and you'll soon be frustrated at how long that code will take to run for any appreciable sized data. R is "vectorized" meaning that you can operate on all 15 elements of data
at once. To print your outliers, I'd do this:
然而,这并不是R的工作方式,您很快就会对代码的运行时间感到失望。R是“vectorized”,意思是您可以同时对所有15个数据元素进行操作。为了打印出你的离群值,我会这样做:
data[data > outliers_right]
#-----
[1] 167 180 156
Or to get all of them at once using the OR operator:
或者使用Or操作符一次性获取所有信息:
data[data< outliers_left | data > outliers_right]
#-----
[1] 100 167 180 156
For a little context, The above logical comparisons create a boolean value for each element of data
and R only returns those that are TRUE. You can check this for yourself by typing:
对于一个小的上下文,上面的逻辑比较为每个数据元素创建一个布尔值,而R只返回TRUE。你可以通过输入:
data > outliers_right
#----
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
The [
bit is actually an extraction operator, used to retrieve a subset of a data object. See the help page for some good background ?"["
.
[bit实际上是一个提取操作符,用于检索数据对象的一个子集。请参阅“帮助页”以获得一些良好的背景信息。
#2
1
The error message arises because you you let i <= vectorCount
so i
can equal vectorCount
, and thus indexing i = i+1
from data will give NA
, and the if
statement will fail.
错误消息会出现,因为您让i <= vectorCount,这样我就可以获得相同的vectorCount,因此从数据中索引i = i+1将提供NA,而if语句将失败。
If you want to find the outliers based on the IQR, you can use findInterval
如果要根据IQR找到异常值,可以使用findInterval。
outliers <- data[findInterval(data, c(Q1,Q3)) != 1]
I would also stop using paste
to create character messages to be printed
, use message
instead.
我也会停止使用粘贴来创建字符消息来打印,而是使用消息。
#1
3
As noted in the comments, the error is due to the way you've constructed your while
loop. At the last iteration, i == 16
though there are only 15 elements to process. Changing from i <= vectorCount
to i < vectorCount
fixes the problem:
正如注释中所指出的,错误是由于您构建while循环的方式造成的。在最后一次迭代中,i == 16,但过程中只有15个元素。从i <= vectorCount改为i < vectorCount修复问题:
i <- 1
while( i < vectorCount ){
i <- i + 1
x <- data[i]
# if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
if(x > outliers_right) {print(x)}
}
#-----
[1] 167
[1] 180
[1] 156
However, this is really not how R works and you'll soon be frustrated at how long that code will take to run for any appreciable sized data. R is "vectorized" meaning that you can operate on all 15 elements of data
at once. To print your outliers, I'd do this:
然而,这并不是R的工作方式,您很快就会对代码的运行时间感到失望。R是“vectorized”,意思是您可以同时对所有15个数据元素进行操作。为了打印出你的离群值,我会这样做:
data[data > outliers_right]
#-----
[1] 167 180 156
Or to get all of them at once using the OR operator:
或者使用Or操作符一次性获取所有信息:
data[data< outliers_left | data > outliers_right]
#-----
[1] 100 167 180 156
For a little context, The above logical comparisons create a boolean value for each element of data
and R only returns those that are TRUE. You can check this for yourself by typing:
对于一个小的上下文,上面的逻辑比较为每个数据元素创建一个布尔值,而R只返回TRUE。你可以通过输入:
data > outliers_right
#----
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
The [
bit is actually an extraction operator, used to retrieve a subset of a data object. See the help page for some good background ?"["
.
[bit实际上是一个提取操作符,用于检索数据对象的一个子集。请参阅“帮助页”以获得一些良好的背景信息。
#2
1
The error message arises because you you let i <= vectorCount
so i
can equal vectorCount
, and thus indexing i = i+1
from data will give NA
, and the if
statement will fail.
错误消息会出现,因为您让i <= vectorCount,这样我就可以获得相同的vectorCount,因此从数据中索引i = i+1将提供NA,而if语句将失败。
If you want to find the outliers based on the IQR, you can use findInterval
如果要根据IQR找到异常值,可以使用findInterval。
outliers <- data[findInterval(data, c(Q1,Q3)) != 1]
I would also stop using paste
to create character messages to be printed
, use message
instead.
我也会停止使用粘贴来创建字符消息来打印,而是使用消息。