子设置R数据帧会产生神秘的NA行

时间:2021-12-13 09:11:47

I've been encountering what I think is a bug. It's not a big deal, but I'm curious if anyone else has seen this. Unfortunately, my data is confidential, so I have to make up an example, and it's not going to be very helpful.

我遇到了我认为是错误的东西。这没什么大不了的,但我很好奇是否有人看过。不幸的是,我的数据是保密的,所以我必须编造一个例子,这不会有多大帮助。

When subsetting my data, I occassionally get mysterious NA rows that aren't in my original data frame. Even the rownames are NA. EG:

在对我的数据进行设置时,我偶尔会得到原始数据框中没有的神秘的NA行。连划船的名字都是幼稚的。例如:

example <- data.frame("var1"=c("A", "B", "A"), "var2"=c("X", "Y", "Z"))
example

  var1 var2
1    A    X
2    B    Y
3    A    Z

then I run:

然后我跑:

example[example$var1=="A",]

  var1 var2
1    A    X
3    A    Z
NA<NA> <NA>

Of course, the example above does not actually give you this mysterious NA row; I am adding it here to illustrate the problem I'm having with my data.

当然,上面的例子并没有给出这个神秘的NA行;我在这里添加它是为了说明我的数据存在的问题。

Maybe it has to do with the fact that I'm importing my original data set using Google's read.xlsx package and then executing wide to long reshape before subsetting.

这可能与我使用谷歌读取导入原始数据集有关。xlsx包,然后执行宽到长重塑,然后再进行子设置。

Thanks

谢谢

6 个解决方案

#1


28  

wrap the condition in 'which', eg

把条件装入“which”中

df[which(df$number1 < df$number2), ]

df((df number1 < df科学美元),美元)

#2


19  

I see this was already answered by the OP, but since his comment is buried deep within the comment section, here's my attempt to fix this issue (at least with my data, which was behaving the same way).

我看到OP已经回答了这个问题,但是由于他的评论被深埋在评论部分中,下面是我解决这个问题的尝试(至少我的数据是这样的)。

First of all, some sample data:

首先是一些样本数据:

> df <- data.frame(name = LETTERS[1:10], number1 = 1:10, number2 = c(10:3, NA, NA))
> df
   name number1 number2
1     A       1      10
2     B       2       9
3     C       3       8
4     D       4       7
5     E       5       6
6     F       6       5
7     G       7       4
8     H       8       3
9     I       9      NA
10    J      10      NA

Now for a simple filter:

现在来看一个简单的过滤器:

> df[df$number1 < df$number2, ]
     name number1 number2
1       A       1      10
2       B       2       9
3       C       3       8
4       D       4       7
5       E       5       6
NA   <NA>      NA      NA
NA.1 <NA>      NA      NA

The problem here is that the presence of NAs in the third column causes R to rewrite the whole row as NA. Nonetheless, the data frame dimensions are maintained. Here's my fix, which requires knowledge of which column contains the NAs:

这里的问题是,在第三列中存在的NAs会导致R将整个行重写为NA。尽管如此,数据框架的维度仍然得到维护。这是我的修正,需要知道哪个列包含NAs:

> df[df$number1 < df$number2 & !is.na(df$number2), ]
  name number1 number2
1    A       1      10
2    B       2       9
3    C       3       8
4    D       4       7
5    E       5       6

#3


11  

I get the same problem when using code similar to what you posted. Using the function subset()

我在使用类似于您发布的代码时遇到了同样的问题。使用功能子集()

subset(example,example$var1=="A")

the NA row instead gets excluded.

而NA行被排除在外。

#4


3  

Using dplyr:

使用dplyr:

library(dplyr)
filter(df, number1 < number2)

#5


0  

Another cause may be that you get the condition wrong, such as checking if a factor column is equal to a value that is not among its levels. Troubled me for a while.

另一个原因可能是您弄错了条件,例如检查因子列是否等于不在其级别中的值。打扰了我一会儿。

#6


0  

   > example <- data.frame("var1"=c("A", NA, "A"), "var2"=c("X", "Y", "Z"))
    > example
      var1 var2
    1    A    X
    2 <NA>    Y
    3    A    Z
    > example[example$var1=="A",]
       var1 var2
    1     A    X
    NA <NA> <NA>
    3     A    Z

Probably this must be your result u are expecting...Try this try using which condition before condition to avoid NA's

也许这就是你期待的结果……尝试使用条件前的哪个条件来避免NA

  example[which(example$var1=="A"),]
      var1 var2
    1    A    X
    3    A    Z

#1


28  

wrap the condition in 'which', eg

把条件装入“which”中

df[which(df$number1 < df$number2), ]

df((df number1 < df科学美元),美元)

#2


19  

I see this was already answered by the OP, but since his comment is buried deep within the comment section, here's my attempt to fix this issue (at least with my data, which was behaving the same way).

我看到OP已经回答了这个问题,但是由于他的评论被深埋在评论部分中,下面是我解决这个问题的尝试(至少我的数据是这样的)。

First of all, some sample data:

首先是一些样本数据:

> df <- data.frame(name = LETTERS[1:10], number1 = 1:10, number2 = c(10:3, NA, NA))
> df
   name number1 number2
1     A       1      10
2     B       2       9
3     C       3       8
4     D       4       7
5     E       5       6
6     F       6       5
7     G       7       4
8     H       8       3
9     I       9      NA
10    J      10      NA

Now for a simple filter:

现在来看一个简单的过滤器:

> df[df$number1 < df$number2, ]
     name number1 number2
1       A       1      10
2       B       2       9
3       C       3       8
4       D       4       7
5       E       5       6
NA   <NA>      NA      NA
NA.1 <NA>      NA      NA

The problem here is that the presence of NAs in the third column causes R to rewrite the whole row as NA. Nonetheless, the data frame dimensions are maintained. Here's my fix, which requires knowledge of which column contains the NAs:

这里的问题是,在第三列中存在的NAs会导致R将整个行重写为NA。尽管如此,数据框架的维度仍然得到维护。这是我的修正,需要知道哪个列包含NAs:

> df[df$number1 < df$number2 & !is.na(df$number2), ]
  name number1 number2
1    A       1      10
2    B       2       9
3    C       3       8
4    D       4       7
5    E       5       6

#3


11  

I get the same problem when using code similar to what you posted. Using the function subset()

我在使用类似于您发布的代码时遇到了同样的问题。使用功能子集()

subset(example,example$var1=="A")

the NA row instead gets excluded.

而NA行被排除在外。

#4


3  

Using dplyr:

使用dplyr:

library(dplyr)
filter(df, number1 < number2)

#5


0  

Another cause may be that you get the condition wrong, such as checking if a factor column is equal to a value that is not among its levels. Troubled me for a while.

另一个原因可能是您弄错了条件,例如检查因子列是否等于不在其级别中的值。打扰了我一会儿。

#6


0  

   > example <- data.frame("var1"=c("A", NA, "A"), "var2"=c("X", "Y", "Z"))
    > example
      var1 var2
    1    A    X
    2 <NA>    Y
    3    A    Z
    > example[example$var1=="A",]
       var1 var2
    1     A    X
    NA <NA> <NA>
    3     A    Z

Probably this must be your result u are expecting...Try this try using which condition before condition to avoid NA's

也许这就是你期待的结果……尝试使用条件前的哪个条件来避免NA

  example[which(example$var1=="A"),]
      var1 var2
    1    A    X
    3    A    Z