Is there a function that keeps rows with at least one NA, discards rows if there are no NAs...the opposite of na.omit()
? I tried !na.omit()
but that didn't work.
是否有一个函数可以保存至少有一个NA的行,如果没有NA则丢弃行...与na.omit()相反?我试过了!na.omit()但是没用。
2 个解决方案
#1
13
Use the negation of complete.cases
, i.e. !complete.cases(x)
使用complete.cases的否定,即!complete.cases(x)
Adapted from ?complete.cases
:
改编自?complete.cases:
data(airquality)
head(airquality[!complete.cases(airquality), ])
Ozone Solar.R Wind Temp Month Day
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
10 NA 194 8.6 69 5 10
11 7 NA 6.9 74 5 11
25 NA 66 16.6 57 5 25
26 NA 266 14.9 58 5 26
#2
3
Here is one approach using dummy data for a matrix, but it can be adapted to the data frame case easily enough.
这是使用矩阵的伪数据的一种方法,但它可以很容易地适应数据帧的情况。
mat <- matrix(runif(100), ncol = 10)
set.seed(2)
mat[sample(100, 10)] <- NA
We can use is.na()
to convert the matrix to a logical one on basis of NA
presence:
我们可以使用is.na()根据NA存在将矩阵转换为逻辑矩阵:
> is.na(mat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[3,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[7,] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
[9,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
From that we can apply()
the any()
function to the rows, to return a logical index for rows with one or more NA
s:
从那里我们可以将()any()函数应用于行,以返回具有一个或多个NA的行的逻辑索引:
> apply(is.na(mat), 1, any)
[1] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
Putting this together we have:
将这些放在一起我们有:
ind <- apply(is.na(mat), 1, any)
mat[ind, ]
giving:
> ind <- apply(is.na(mat), 1, any)
> mat[ind, ]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.6618988 0.01041453 0.9817279 0.007109038 0.77002786 NA
[2,] 0.8368892 NA 0.1150841 0.683403423 0.62512173 0.04217553
[3,] 0.1505014 0.86886104 0.1632009 0.929720222 NA 0.18467346
[4,] 0.1492469 NA 0.9746879 0.785878913 0.38814476 NA
[5,] 0.3570626 0.28487057 0.3490884 0.988902156 0.46150111 0.86784466
[6,] 0.9626440 NA 0.5019699 0.613952910 0.21867519 0.40264274
[7,] 0.1323720 0.15046975 0.8103973 0.710185730 0.06593551 0.57268500
[,7] [,8] [,9] [,10]
[1,] 0.35064257 0.9767552 0.2009347 NA
[2,] 0.02505036 0.3799989 0.9806000 0.3733586
[3,] 0.40110104 0.5603876 0.8289221 0.5743769
[4,] 0.97151543 0.4269434 0.8989719 0.8726963
[5,] 0.32372244 NA 0.4533770 0.1105549
[6,] 0.73319143 0.1153091 0.1474178 0.9527002
[7,] NA 0.4400317 NA 0.5690021
#1
13
Use the negation of complete.cases
, i.e. !complete.cases(x)
使用complete.cases的否定,即!complete.cases(x)
Adapted from ?complete.cases
:
改编自?complete.cases:
data(airquality)
head(airquality[!complete.cases(airquality), ])
Ozone Solar.R Wind Temp Month Day
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
10 NA 194 8.6 69 5 10
11 7 NA 6.9 74 5 11
25 NA 66 16.6 57 5 25
26 NA 266 14.9 58 5 26
#2
3
Here is one approach using dummy data for a matrix, but it can be adapted to the data frame case easily enough.
这是使用矩阵的伪数据的一种方法,但它可以很容易地适应数据帧的情况。
mat <- matrix(runif(100), ncol = 10)
set.seed(2)
mat[sample(100, 10)] <- NA
We can use is.na()
to convert the matrix to a logical one on basis of NA
presence:
我们可以使用is.na()根据NA存在将矩阵转换为逻辑矩阵:
> is.na(mat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[3,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[7,] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
[9,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
From that we can apply()
the any()
function to the rows, to return a logical index for rows with one or more NA
s:
从那里我们可以将()any()函数应用于行,以返回具有一个或多个NA的行的逻辑索引:
> apply(is.na(mat), 1, any)
[1] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
Putting this together we have:
将这些放在一起我们有:
ind <- apply(is.na(mat), 1, any)
mat[ind, ]
giving:
> ind <- apply(is.na(mat), 1, any)
> mat[ind, ]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.6618988 0.01041453 0.9817279 0.007109038 0.77002786 NA
[2,] 0.8368892 NA 0.1150841 0.683403423 0.62512173 0.04217553
[3,] 0.1505014 0.86886104 0.1632009 0.929720222 NA 0.18467346
[4,] 0.1492469 NA 0.9746879 0.785878913 0.38814476 NA
[5,] 0.3570626 0.28487057 0.3490884 0.988902156 0.46150111 0.86784466
[6,] 0.9626440 NA 0.5019699 0.613952910 0.21867519 0.40264274
[7,] 0.1323720 0.15046975 0.8103973 0.710185730 0.06593551 0.57268500
[,7] [,8] [,9] [,10]
[1,] 0.35064257 0.9767552 0.2009347 NA
[2,] 0.02505036 0.3799989 0.9806000 0.3733586
[3,] 0.40110104 0.5603876 0.8289221 0.5743769
[4,] 0.97151543 0.4269434 0.8989719 0.8726963
[5,] 0.32372244 NA 0.4533770 0.1105549
[6,] 0.73319143 0.1153091 0.1474178 0.9527002
[7,] NA 0.4400317 NA 0.5690021