在数据帧的选定列中包含NA(缺失)值的行的子集

时间:2020-12-07 01:37:59

We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column (VaR2) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA, for missing data.

我们有一个来自CSV文件的数据帧。数据帧DF有包含观测值和列(VaR2)的列,其中包含测量的日期。如果没有记录日期,CSV文件包含缺失数据的值NA。

Var1  Var2 
10   2010/01/01
20   NA
30   2010/03/01

We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA' value from the column (VaR2). In the example given, only Row 2 will be contained in the new DF.

我们希望使用子集命令来定义新的数据帧new_DF,以便它只包含列中具有NA'值的行(VaR2)。在给定的示例中,新DF中只包含第2行。

The command

命令

new_DF<-subset(DF,DF$Var2=="NA") 

does not work, the resulting data frame has no row entries.

不工作,结果数据帧没有行条目。

If in the original CSV file the Value NA are exchanged with NULL, the same command produces the desired result: new_DF<-subset(DF,DF$Var2=="NULL").

如果在原始的CSV文件中,值NA与NULL交换,该命令将生成所需的结果:new_DF<-子集(DF,DF$Var2= "NULL")。

How can I get this method working, if for the character string the value NA is provided in the original CSV file?

如果在原始的CSV文件中提供了值NA,那么如何使该方法正常工作呢?

5 个解决方案

#1


105  

Never use =='NA' to test for missing values. Use is.na() instead. This should do it:

不要使用='NA'来测试缺少的值。使用is.na()。这应该这样做:

new_DF <- DF[rowSums(is.na(DF)) > 0,]

or in case you want to check a particular column, you can also use

或者,如果您想检查特定的列,也可以使用

new_DF <- DF[is.na(DF$Var),]

In case you have NA character values, first run

如果您有NA字符值,请首先运行

Df[Df=='NA'] <- NA

to replace them with missing values.

用丢失的值替换它们。

#2


36  

NA is a special value in R, do not mix up the NA value with the "NA" string. Depending on the way the data was imported, your "NA" and "NULL" cells may be of various type (the default behavior is to convert "NA" strings to NA values, and let "NULL" strings as is).

NA是R中的一个特殊值,不要将NA值与“NA”字符串混淆。根据导入数据的方式,您的“NA”和“NULL”单元格可能具有不同的类型(默认行为是将“NA”字符串转换为NA值,让“NULL”字符串保持原样)。

If using read.table() or read.csv(), you should consider the "na.strings" argument to do clean data import, and always work with real R NA values.

如果使用read.table()或read.csv(),应该考虑“na”。字符串”参数,用于执行干净的数据导入,并始终使用真实的R NA值。

An example, working in both cases "NULL" and "NA" cells :

例如,在“NULL”和“NA”两种情况下工作:

DF <- read.csv("file.csv", na.strings=c("NA", "NULL"))
new_DF <- subset(DF, is.na(DF$Var2))

#3


14  

complete.cases gives TRUE when all values in a row are not NA

完成了。当一行中的所有值都不是NA时,情况就为真。

DF[!complete.cases(DF), ]

#4


2  

Try changing this:

试着改变:

new_DF<-dplyr::filter(DF,is.na(Var2)) 

#5


-1  

Prints all the rows with NA data:

用NA数据打印所有行:

tmp <- data.frame(c(1,2,3),c(4,NA,5));
tmp[round(which(is.na(tmp))/ncol(tmp)),]

#1


105  

Never use =='NA' to test for missing values. Use is.na() instead. This should do it:

不要使用='NA'来测试缺少的值。使用is.na()。这应该这样做:

new_DF <- DF[rowSums(is.na(DF)) > 0,]

or in case you want to check a particular column, you can also use

或者,如果您想检查特定的列,也可以使用

new_DF <- DF[is.na(DF$Var),]

In case you have NA character values, first run

如果您有NA字符值,请首先运行

Df[Df=='NA'] <- NA

to replace them with missing values.

用丢失的值替换它们。

#2


36  

NA is a special value in R, do not mix up the NA value with the "NA" string. Depending on the way the data was imported, your "NA" and "NULL" cells may be of various type (the default behavior is to convert "NA" strings to NA values, and let "NULL" strings as is).

NA是R中的一个特殊值,不要将NA值与“NA”字符串混淆。根据导入数据的方式,您的“NA”和“NULL”单元格可能具有不同的类型(默认行为是将“NA”字符串转换为NA值,让“NULL”字符串保持原样)。

If using read.table() or read.csv(), you should consider the "na.strings" argument to do clean data import, and always work with real R NA values.

如果使用read.table()或read.csv(),应该考虑“na”。字符串”参数,用于执行干净的数据导入,并始终使用真实的R NA值。

An example, working in both cases "NULL" and "NA" cells :

例如,在“NULL”和“NA”两种情况下工作:

DF <- read.csv("file.csv", na.strings=c("NA", "NULL"))
new_DF <- subset(DF, is.na(DF$Var2))

#3


14  

complete.cases gives TRUE when all values in a row are not NA

完成了。当一行中的所有值都不是NA时,情况就为真。

DF[!complete.cases(DF), ]

#4


2  

Try changing this:

试着改变:

new_DF<-dplyr::filter(DF,is.na(Var2)) 

#5


-1  

Prints all the rows with NA data:

用NA数据打印所有行:

tmp <- data.frame(c(1,2,3),c(4,NA,5));
tmp[round(which(is.na(tmp))/ncol(tmp)),]