用于保留> = 1 NA值(与na.omit相反)的行的函数

时间:2023-01-16 15:44:01

Is there a function that keeps rows with at least one NA, discards rows if there are no NAs...the opposite of na.omit()? I tried !na.omit() but that didn't work.

是否有一个函数可以保存至少有一个NA的行,如果没有NA则丢弃行...与na.omit()相反?我试过了!na.omit()但是没用。

2 个解决方案

#1


13  

Use the negation of complete.cases, i.e. !complete.cases(x)

使用complete.cases的否定,即!complete.cases(x)

Adapted from ?complete.cases:

改编自?complete.cases:

data(airquality)
head(airquality[!complete.cases(airquality), ])

   Ozone Solar.R Wind Temp Month Day
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
10    NA     194  8.6   69     5  10
11     7      NA  6.9   74     5  11
25    NA      66 16.6   57     5  25
26    NA     266 14.9   58     5  26

#2


3  

Here is one approach using dummy data for a matrix, but it can be adapted to the data frame case easily enough.

这是使用矩阵的伪数据的一种方法,但它可以很容易地适应数据帧的情况。

mat <- matrix(runif(100), ncol = 10)
set.seed(2)
mat[sample(100, 10)] <- NA

We can use is.na() to convert the matrix to a logical one on basis of NA presence:

我们可以使用is.na()根据NA存在将矩阵转换为逻辑矩阵:

> is.na(mat)
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [3,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [4,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [7,] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
 [9,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE

From that we can apply() the any() function to the rows, to return a logical index for rows with one or more NAs:

从那里我们可以将()any()函数应用于行,以返回具有一个或多个NA的行的逻辑索引:

> apply(is.na(mat), 1, any)
 [1]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Putting this together we have:

将这些放在一起我们有:

ind <- apply(is.na(mat), 1, any)
mat[ind, ]

giving:

> ind <- apply(is.na(mat), 1, any)
> mat[ind, ]
          [,1]       [,2]      [,3]        [,4]       [,5]       [,6]
[1,] 0.6618988 0.01041453 0.9817279 0.007109038 0.77002786         NA
[2,] 0.8368892         NA 0.1150841 0.683403423 0.62512173 0.04217553
[3,] 0.1505014 0.86886104 0.1632009 0.929720222         NA 0.18467346
[4,] 0.1492469         NA 0.9746879 0.785878913 0.38814476         NA
[5,] 0.3570626 0.28487057 0.3490884 0.988902156 0.46150111 0.86784466
[6,] 0.9626440         NA 0.5019699 0.613952910 0.21867519 0.40264274
[7,] 0.1323720 0.15046975 0.8103973 0.710185730 0.06593551 0.57268500
           [,7]      [,8]      [,9]     [,10]
[1,] 0.35064257 0.9767552 0.2009347        NA
[2,] 0.02505036 0.3799989 0.9806000 0.3733586
[3,] 0.40110104 0.5603876 0.8289221 0.5743769
[4,] 0.97151543 0.4269434 0.8989719 0.8726963
[5,] 0.32372244        NA 0.4533770 0.1105549
[6,] 0.73319143 0.1153091 0.1474178 0.9527002
[7,]         NA 0.4400317        NA 0.5690021

#1


13  

Use the negation of complete.cases, i.e. !complete.cases(x)

使用complete.cases的否定,即!complete.cases(x)

Adapted from ?complete.cases:

改编自?complete.cases:

data(airquality)
head(airquality[!complete.cases(airquality), ])

   Ozone Solar.R Wind Temp Month Day
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
10    NA     194  8.6   69     5  10
11     7      NA  6.9   74     5  11
25    NA      66 16.6   57     5  25
26    NA     266 14.9   58     5  26

#2


3  

Here is one approach using dummy data for a matrix, but it can be adapted to the data frame case easily enough.

这是使用矩阵的伪数据的一种方法,但它可以很容易地适应数据帧的情况。

mat <- matrix(runif(100), ncol = 10)
set.seed(2)
mat[sample(100, 10)] <- NA

We can use is.na() to convert the matrix to a logical one on basis of NA presence:

我们可以使用is.na()根据NA存在将矩阵转换为逻辑矩阵:

> is.na(mat)
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [3,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [4,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [7,] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
 [9,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE

From that we can apply() the any() function to the rows, to return a logical index for rows with one or more NAs:

从那里我们可以将()any()函数应用于行,以返回具有一个或多个NA的行的逻辑索引:

> apply(is.na(mat), 1, any)
 [1]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Putting this together we have:

将这些放在一起我们有:

ind <- apply(is.na(mat), 1, any)
mat[ind, ]

giving:

> ind <- apply(is.na(mat), 1, any)
> mat[ind, ]
          [,1]       [,2]      [,3]        [,4]       [,5]       [,6]
[1,] 0.6618988 0.01041453 0.9817279 0.007109038 0.77002786         NA
[2,] 0.8368892         NA 0.1150841 0.683403423 0.62512173 0.04217553
[3,] 0.1505014 0.86886104 0.1632009 0.929720222         NA 0.18467346
[4,] 0.1492469         NA 0.9746879 0.785878913 0.38814476         NA
[5,] 0.3570626 0.28487057 0.3490884 0.988902156 0.46150111 0.86784466
[6,] 0.9626440         NA 0.5019699 0.613952910 0.21867519 0.40264274
[7,] 0.1323720 0.15046975 0.8103973 0.710185730 0.06593551 0.57268500
           [,7]      [,8]      [,9]     [,10]
[1,] 0.35064257 0.9767552 0.2009347        NA
[2,] 0.02505036 0.3799989 0.9806000 0.3733586
[3,] 0.40110104 0.5603876 0.8289221 0.5743769
[4,] 0.97151543 0.4269434 0.8989719 0.8726963
[5,] 0.32372244        NA 0.4533770 0.1105549
[6,] 0.73319143 0.1153091 0.1474178 0.9527002
[7,]         NA 0.4400317        NA 0.5690021