I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values (NA
, <NA>
). The following is fragment of the function:
我正在编写一个函数,需要检查(和哪个!)列(变量)是否具有所有缺失值(NA,
test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3))
na.test <- function (data) {
if (colSums(!is.na(data) == 0)){
stop ("The some variable in the dataset has all missing value,
remove the column to proceed")
}
}
na.test (test1)
Warning message:
In if (colSums(!is.na(data) == 0)) { :
the condition has length > 1 and only the first element will be used
Q1: Why is the above error and any fixes ?
Q1:为什么上述错误和任何修复?
Q2: Is there any way to find which of columns have all NA
, for example output the list (name of variable or column number)?
Q2:有没有办法找到哪些列都具有NA,例如输出列表(变量名称或列号)?
5 个解决方案
#1
28
This is easy enough to with sapply
and a small anonymous function:
这很容易使用sapply和一个小的匿名函数:
sapply(test1, function(x)all(is.na(x)))
X1 X2 X3
FALSE FALSE FALSE
sapply(test2, function(x)all(is.na(x)))
X1 X2 X3
FALSE TRUE FALSE
And inside a function:
并且在函数内部:
na.test <- function (x) {
w <- sapply(x, function(x)all(is.na(x)))
if (any(w)) {
stop(paste("All NA in columns", paste(which(w), collapse=", ")))
}
}
na.test(test1)
na.test(test2)
Error in na.test(test2) : All NA in columns 2
#2
6
In dplyr
在dplyr
ColNums_NotAllMissing <- function(df){ # helper function
as.vector(which(colSums(is.na(df)) != nrow(df)))
}
df %>%
select(ColNums_NotAllMissing(.))
example:
x <- data.frame(x = c(NA, NA, NA), y = c(1, 2, NA), z = c(5, 6, 7))
x %>%
select(ColNums_NotAllMissing(.))
or, the other way around
或者,相反
Cols_AllMissing <- function(df){ # helper function
as.vector(which(colSums(is.na(df)) == nrow(df)))
}
x %>%
select(-Cols_AllMissing(.))
#3
5
To find the columns with all values missing
查找缺少所有值的列
allmisscols <- apply(dataset,2, function(x)all(is.na(x)));
colswithallmiss <-names(allmisscols[allmisscols>0]);
print("the columns with all values missing");
print(colswithallmiss);
#4
1
To test whether columns have all missing values:
要测试列是否包含所有缺失值:
apply(test1,2,function(x) {all(is.na(x))})
To get which columns have all missing values:
要获取哪些列具有所有缺失值:
test1.nona <- test1[ , colSums(is.na(test1)) == 0]
#5
0
The following command gives you a nice table with the columns that have NA values:
以下命令为您提供了一个包含NA值的列的漂亮表:
sapply(dataframe, function(x)all(any(is.na(x))))
It's an improvement for the first answer you got, which doesn't work properly from some cases.
这是对你得到的第一个答案的改进,在某些情况下无法正常工作。
#1
28
This is easy enough to with sapply
and a small anonymous function:
这很容易使用sapply和一个小的匿名函数:
sapply(test1, function(x)all(is.na(x)))
X1 X2 X3
FALSE FALSE FALSE
sapply(test2, function(x)all(is.na(x)))
X1 X2 X3
FALSE TRUE FALSE
And inside a function:
并且在函数内部:
na.test <- function (x) {
w <- sapply(x, function(x)all(is.na(x)))
if (any(w)) {
stop(paste("All NA in columns", paste(which(w), collapse=", ")))
}
}
na.test(test1)
na.test(test2)
Error in na.test(test2) : All NA in columns 2
#2
6
In dplyr
在dplyr
ColNums_NotAllMissing <- function(df){ # helper function
as.vector(which(colSums(is.na(df)) != nrow(df)))
}
df %>%
select(ColNums_NotAllMissing(.))
example:
x <- data.frame(x = c(NA, NA, NA), y = c(1, 2, NA), z = c(5, 6, 7))
x %>%
select(ColNums_NotAllMissing(.))
or, the other way around
或者,相反
Cols_AllMissing <- function(df){ # helper function
as.vector(which(colSums(is.na(df)) == nrow(df)))
}
x %>%
select(-Cols_AllMissing(.))
#3
5
To find the columns with all values missing
查找缺少所有值的列
allmisscols <- apply(dataset,2, function(x)all(is.na(x)));
colswithallmiss <-names(allmisscols[allmisscols>0]);
print("the columns with all values missing");
print(colswithallmiss);
#4
1
To test whether columns have all missing values:
要测试列是否包含所有缺失值:
apply(test1,2,function(x) {all(is.na(x))})
To get which columns have all missing values:
要获取哪些列具有所有缺失值:
test1.nona <- test1[ , colSums(is.na(test1)) == 0]
#5
0
The following command gives you a nice table with the columns that have NA values:
以下命令为您提供了一个包含NA值的列的漂亮表:
sapply(dataframe, function(x)all(any(is.na(x))))
It's an improvement for the first answer you got, which doesn't work properly from some cases.
这是对你得到的第一个答案的改进,在某些情况下无法正常工作。